CLAP: Coreference-Linked Augmentation for Passage Retrieval
Huanwei Xu, Lin Xu, Liang Yuan

TL;DR
CLAP is a novel LLM-based passage augmentation method that improves dense retrieval by maintaining coreference coherence and aligning pseudo-queries with retriever representations, leading to significant performance gains across domains.
Contribution
Introduces CLAP, a lightweight, domain-agnostic passage expansion framework that resolves coreference and generates localized pseudo-queries for enhanced retrieval.
Findings
Up to 20.68% absolute nDCG@10 improvement in retrieval performance.
Consistent gains across different retriever strengths and domains.
Outperforms traditional LLM-based expansion methods in out-of-domain settings.
Abstract
Large Language Model (LLM)-based passage expansion has shown promise for enhancing first-stage retrieval, but often underperforms with dense retrievers due to semantic drift and misalignment with their pretrained semantic space. Beyond this, only a portion of a passage is typically relevant to a query, while the rest introduces noise--an issue compounded by chunking techniques that break coreference continuity. We propose Coreference-Linked Augmentation for Passage Retrieval (CLAP), a lightweight LLM-based expansion framework that segments passages into coherent chunks, resolves coreference chains, and generates localized pseudo-queries aligned with dense retriever representations. A simple fusion of global topical signals and fine-grained subtopic signals achieves robust performance across domains. CLAP yields consistent gains even as retriever strength increases, enabling dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
