| dc.contributor.advisor |
Beltran Prieto, Luis Antonio
|
|
| dc.contributor.author |
Tayo, Aderiye Oluwasijibomi
|
|
| dc.date.accessioned |
2025-12-10T23:09:51Z |
|
| dc.date.available |
2025-12-10T23:09:51Z |
|
| dc.date.issued |
2024-10-27 |
|
| dc.identifier |
Elektronický archiv Knihovny UTB |
|
| dc.identifier.uri |
http://hdl.handle.net/10563/57753
|
|
| dc.description.abstract |
This thesis explores the impact of different text chunking strategies on the performance of Large Language Models (LLMs) in applications such as retrieval-augmented generation (RAG) and semantic search. It presents a comparative evaluation of sentence-based, recursive, and semantic chunking methods, analyzing their effectiveness in preserving context and meaning. Building on these insights, the thesis introduces a novel hybrid approachMarkdown-Aware Semantic Chunking (MASC)which leverages document structure and semantic similarity to optimize chunk formation. Empirical results demonstrate that MASC outperforms traditional methods across key evaluation metrics, offering improved accuracy, relevance, and faithfulness in LLM-generated responses. |
|
| dc.format |
63 |
|
| dc.language.iso |
en |
|
| dc.publisher |
Univerzita Tomáše Bati ve Zlíně |
|
| dc.rights |
Bez omezení |
|
| dc.subject |
RAG
|
cs |
| dc.subject |
LLM
|
cs |
| dc.subject |
Semantic
|
cs |
| dc.subject |
NLP
|
cs |
| dc.subject |
Chunking
|
cs |
| dc.subject |
RAG
|
en |
| dc.subject |
LLM
|
en |
| dc.subject |
Semantic
|
en |
| dc.subject |
NLP
|
en |
| dc.subject |
Chunking
|
en |
| dc.title |
Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy |
|
| dc.title.alternative |
Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy |
|
| dc.type |
diplomová práce |
cs |
| dc.contributor.referee |
Malina, Marek |
|
| dc.date.accepted |
2025-06-18 |
|
| dc.description.abstract-translated |
This thesis explores the impact of different text chunking strategies on the performance of Large Language Models (LLMs) in applications such as retrieval-augmented generation (RAG) and semantic search. It presents a comparative evaluation of sentence-based, recursive, and semantic chunking methods, analyzing their effectiveness in preserving context and meaning. Building on these insights, the thesis introduces a novel hybrid approachMarkdown-Aware Semantic Chunking (MASC)which leverages document structure and semantic similarity to optimize chunk formation. Empirical results demonstrate that MASC outperforms traditional methods across key evaluation metrics, offering improved accuracy, relevance, and faithfulness in LLM-generated responses. |
|
| dc.description.department |
Ústav informatiky a umělé inteligence |
|
| dc.thesis.degree-discipline |
Software Engineering |
cs |
| dc.thesis.degree-discipline |
Software Engineering |
en |
| dc.thesis.degree-grantor |
Univerzita Tomáše Bati ve Zlíně. Fakulta aplikované informatiky |
cs |
| dc.thesis.degree-grantor |
Tomas Bata University in Zlín. Faculty of Applied Informatics |
en |
| dc.thesis.degree-name |
Ing. |
|
| dc.thesis.degree-program |
Information Technologies |
cs |
| dc.thesis.degree-program |
Information Technologies |
en |
| dc.identifier.stag |
70159
|
|
| dc.date.submitted |
2025-06-02 |
|