Intent-Driven Dynamic Chunking: Segmenting Documents to Reflect Predicted Information Needs
Christos Koutsiaris

TL;DR
This paper presents Intent-Driven Dynamic Chunking (IDC), a novel method that segments documents based on predicted user intents using large language models and dynamic programming, significantly improving retrieval accuracy and chunk efficiency.
Contribution
The paper introduces IDC, a new intent-aware segmentation approach that leverages LLMs and dynamic programming to optimize document chunks for better information retrieval.
Findings
Outperformed traditional chunking methods on five datasets
Improved top-1 retrieval accuracy by 5% to 67%
Produced 40-60% fewer chunks with high answer coverage
Abstract
Breaking long documents into smaller segments is a fundamental challenge in information retrieval. Whether for search engines, question-answering systems, or retrieval-augmented generation (RAG), effective segmentation determines how well systems can locate and return relevant information. However, traditional methods, such as fixed-length or coherence-based segmentation, ignore user intent, leading to chunks that split answers or contain irrelevant noise. We introduce Intent-Driven Dynamic Chunking (IDC), a novel approach that uses predicted user queries to guide document segmentation. IDC leverages a Large Language Model to generate likely user intents for a document and then employs a dynamic programming algorithm to find the globally optimal chunk boundaries. This represents a novel application of DP to intent-aware segmentation that avoids greedy pitfalls. We evaluated IDC on six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems
