SmartChunk Retrieval: Query-Aware Chunk Compression with Planning for Efficient Document RAG
Xuechen Zhang, Koustava Goswami, Samet Oymak, Jiasi Chen, Nedim Lipka

TL;DR
SmartChunk retrieval introduces a query-adaptive, reinforcement learning-based framework for dynamic document chunking, improving retrieval accuracy and efficiency in long-document question answering tasks.
Contribution
It proposes a novel planner with reinforcement learning to adaptively select chunk abstraction levels, enhancing retrieval quality and scalability over fixed strategies.
Findings
Outperforms state-of-the-art RAG baselines on multiple QA benchmarks.
Reduces retrieval cost while maintaining high accuracy.
Demonstrates strong scalability with larger corpora and diverse datasets.
Abstract
Retrieval-augmented generation (RAG) has strong potential for producing accurate and factual outputs by combining language models (LMs) with evidence retrieved from large text corpora. However, current pipelines are limited by static chunking and flat retrieval: documents are split into short, predetermined, fixed-size chunks, embeddings are retrieved uniformly, and generation relies on whatever chunks are returned. This design brings challenges, as retrieval quality is highly sensitive to chunk size, often introduces noise from irrelevant or misleading chunks, and scales poorly to large corpora. We present SmartChunk retrieval, a query-adaptive framework for efficient and robust long-document question answering (QA). SmartChunk uses (i) a planner that predicts the optimal chunk abstraction level for each query, and (ii) a lightweight compression module that produces high-level chunk…
Peer Reviews
Decision·ICLR 2026 Poster
The paper is well written and the method is well ablated with each part showing the tradeoffs. The method has comparable performance to methods like RAPTOR, GRAG, MAL RAG. The Chunk Compression Encoder is specfically interesting, It is a surprising result how it can boost results compared to directly just embedding the document. The method also shows some generalization.
Compared to some previous methods like RAPTOR, SmartChunk requires training, for the planner and the compression encoder. A few important baselines that are currently mssing is just retrieving from the database with differing chunk levels i.e. the model can retireve from all the chunks together (at different token levels) where the tokens can be embedded normally and also via the chunk compression encoder.
1. The paper identifies the most significant pain point of current advanced RAG systems: high costs. 2. In response to the exorbitant costs of LLM-based summarization, the paper proposes the chunk compression encoder. 3. The paper implements a dynamic RAG system, claiming to achieve a trade-off between accuracy and cost.
1. The motivation behind the planner is extremely difficult to comprehend. The core assumption of the paper is that a query-based planner can predict, prior to retrieval, the minimum and maximum chunk levels required to answer a question. However, given that the information distribution within documents is unknown, such predictions lack sufficient informational support and exhibit low scientific rigor. 2. Why must the information required to answer a question be precisely distributed across a co
1. Practical relevance: Addresses a real bottleneck in current RAG systems where fixed chunking strategies perform poorly across diverse queries and documents. 2. Technical innovation: The STITCH training methodology is novel and addresses genuine challenges in training planners with noisy pseudo-labels and multi-objective rewards. 3. Comprehensive evaluation: Thorough experimental validation across multiple datasets with different characteristics, including out-of-domain evaluation. 4. Efficien
1. Complexity vs. gains: The STITCH training procedure adds significant complexity to achieve what appears to be modest improvements over simpler baselines. The cost-benefit trade-off may not justify the added complexity in all scenarios. 2. Limited theoretical analysis: While the empirical results are strong, the paper lacks theoretical analysis of when and why the approach should work better than alternatives. 3. Reproducibility concerns: The STITCH training involves multiple stages with vario
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques
