CHOP: Chunkwise Context-Preserving Framework for RAG on Multi Documents
Hyunseok Park, Jihyeon Kim, Jongeun Kim, and Dongsik Yoon

TL;DR
CHOP is a framework that improves retrieval accuracy in RAG systems by iteratively evaluating chunk relevance and preserving context, reducing confusion among similar documents.
Contribution
It introduces a novel chunkwise evaluation and reconstruction method using LLMs, enhancing document discrimination and retrieval precision.
Findings
Achieves a Top-1 Hit Rate of 90.77% on benchmark datasets.
Reduces semantic conflicts among similar documents.
Enhances ranking quality metrics in retrieval tasks.
Abstract
Retrieval-Augmented Generation (RAG) systems lose retrieval accuracy when similar documents coexist in the vector database, causing unnecessary information, hallucinations, and factual errors. To alleviate this issue, we propose CHOP, a framework that iteratively evaluates chunk relevance with Large Language Models (LLMs) and progressively reconstructs documents by determining their association with specific topics or query types. CHOP integrates two key components: the CNM-Extractor, which generates compact per-chunk signatures capturing categories, key nouns, and model names, and the Continuity Decision Module, which preserves contextual coherence by deciding whether consecutive chunks belong to the same document flow. By prefixing each chunk with context-aware metadata, CHOP reduces semantic conflicts among similar documents and enhances retriever discrimination. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
