A Systematic Analysis of Chunking Strategies for Reliable Question Answering
Sofia Bennani, Charles Moslonka

TL;DR
This paper systematically evaluates how different document chunking strategies affect the reliability and cost-efficiency of Retrieval-Augmented Generation systems, providing practical guidelines for industrial deployment.
Contribution
It offers a comprehensive end-to-end analysis of chunking methods, sizes, and overlaps, revealing optimal strategies for cost-effective and reliable QA system deployment.
Findings
Overlap offers no benefit and increases costs.
Sentence chunking is most cost-effective, matching semantic chunking up to 5k tokens.
A 'context cliff' occurs beyond 2.5k tokens, reducing quality.
Abstract
We study how document chunking choices impact the reliability of Retrieval-Augmented Generation (RAG) systems in industry. While practice often relies on heuristics, our end-to-end evaluation on Natural Questions systematically varies chunking method (token, sentence, semantic, code), chunk size, overlap, and context length. We use a standard industrial setup: SPLADE retrieval and a Mistral-8B generator. We derive actionable lessons for cost-efficient deployment: (i) overlap provides no measurable benefit and increases indexing cost; (ii) sentence chunking is the most cost-effective method, matching semantic chunking up to ~5k tokens; (iii) a "context cliff" reduces quality beyond ~2.5k tokens; and (iv) optimal context depends on the goal (semantic quality peaks at small contexts; exact match at larger ones).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Natural Language Processing Techniques
