Structure and Diversity Aware Context Bubble Construction for Enterprise Retrieval Augmented Systems
Amir Khurshid, Abhishek Sehgal

TL;DR
This paper introduces a structure-aware, diversity-constrained method for constructing coherent context bundles in enterprise retrieval systems, improving relevance, coverage, and reducing redundancy within token limits.
Contribution
It proposes a novel framework that leverages document structure and diversity constraints to assemble compact, informative context sets, outperforming traditional top-k retrieval methods.
Findings
Reduces redundant context significantly
Improves coverage of secondary facets
Enhances answer quality and citation faithfulness
Abstract
Large language model (LLM) contexts are typically constructed using retrieval-augmented generation (RAG), which involves ranking and selecting the top-k passages. The approach causes fragmentation in information graphs in document structures, over-retrieval, and duplication of content alongside insufficient query context, including 2nd and 3rd order facets. In this paper, a structure-informed and diversity-constrained context bubble construction framework is proposed that assembles coherent, citable bundles of spans under a strict token budget. The method preserves and exploits inherent document structure by organising multi-granular spans (e.g., sections and rows) and using task-conditioned structural priors to guide retrieval. Starting from high-relevance anchor spans, a context bubble is constructed through constrained selection that balances query relevance, marginal coverage, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Graph Neural Networks
