Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy

DSpace Repository

Language: English čeština 

Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy

Show simple item record

dc.contributor.advisor Beltran Prieto, Luis Antonio
dc.contributor.author Tayo, Aderiye Oluwasijibomi
dc.date.accessioned 2025-12-10T23:09:51Z
dc.date.available 2025-12-10T23:09:51Z
dc.date.issued 2024-10-27
dc.identifier Elektronický archiv Knihovny UTB
dc.identifier.uri http://hdl.handle.net/10563/57753
dc.description.abstract This thesis explores the impact of different text chunking strategies on the performance of Large Language Models (LLMs) in applications such as retrieval-augmented generation (RAG) and semantic search. It presents a comparative evaluation of sentence-based, recursive, and semantic chunking methods, analyzing their effectiveness in preserving context and meaning. Building on these insights, the thesis introduces a novel hybrid approachMarkdown-Aware Semantic Chunking (MASC)which leverages document structure and semantic similarity to optimize chunk formation. Empirical results demonstrate that MASC outperforms traditional methods across key evaluation metrics, offering improved accuracy, relevance, and faithfulness in LLM-generated responses.
dc.format 63
dc.language.iso en
dc.publisher Univerzita Tomáše Bati ve Zlíně
dc.rights Bez omezení
dc.subject RAG cs
dc.subject LLM cs
dc.subject Semantic cs
dc.subject NLP cs
dc.subject Chunking cs
dc.subject RAG en
dc.subject LLM en
dc.subject Semantic en
dc.subject NLP en
dc.subject Chunking en
dc.title Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy
dc.title.alternative Analysis of Chunking Strategies for LLM Applications and Proposal of a New Strategy
dc.type diplomová práce cs
dc.contributor.referee Malina, Marek
dc.date.accepted 2025-06-18
dc.description.abstract-translated This thesis explores the impact of different text chunking strategies on the performance of Large Language Models (LLMs) in applications such as retrieval-augmented generation (RAG) and semantic search. It presents a comparative evaluation of sentence-based, recursive, and semantic chunking methods, analyzing their effectiveness in preserving context and meaning. Building on these insights, the thesis introduces a novel hybrid approachMarkdown-Aware Semantic Chunking (MASC)which leverages document structure and semantic similarity to optimize chunk formation. Empirical results demonstrate that MASC outperforms traditional methods across key evaluation metrics, offering improved accuracy, relevance, and faithfulness in LLM-generated responses.
dc.description.department Ústav informatiky a umělé inteligence
dc.thesis.degree-discipline Software Engineering cs
dc.thesis.degree-discipline Software Engineering en
dc.thesis.degree-grantor Univerzita Tomáše Bati ve Zlíně. Fakulta aplikované informatiky cs
dc.thesis.degree-grantor Tomas Bata University in Zlín. Faculty of Applied Informatics en
dc.thesis.degree-name Ing.
dc.thesis.degree-program Information Technologies cs
dc.thesis.degree-program Information Technologies en
dc.identifier.stag 70159
dc.date.submitted 2025-06-02


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Find fulltext

Search DSpace


Browse

My Account