Association Is Not Similarity: Learning Corpus-Specific Associations for Multi-Hop Retrieval
Jason Dury

TL;DR
The paper introduces Association-Augmented Retrieval (AAR), a lightweight method that improves multi-hop passage retrieval by learning corpus-specific associations, significantly enhancing recall on benchmarks.
Contribution
AAR is a novel, efficient reranking approach that captures corpus-specific associations for better multi-hop retrieval without requiring large language models.
Findings
AAR improves HotpotQA Passage Recall@5 from 0.831 to 0.916.
AAR achieves +10.1 points on MuSiQue in the transductive setting.
Retrieval gains translate to +6.4 exact match in downstream QA.
Abstract
Dense retrieval systems rank passages by embedding similarity to a query, but multi-hop questions require passages that are associatively related through shared reasoning chains. We introduce Association-Augmented Retrieval (AAR), a lightweight transductive reranking method that trains a small MLP (4.2M parameters) to learn associative relationships between passages in embedding space using contrastive learning on co-occurrence annotations. At inference time, AAR reranks an initial dense retrieval candidate set using bi-directional association scoring. On HotpotQA, AAR improves passage Recall@5 from 0.831 to 0.916 (+8.6 points) without evaluation-set tuning, with gains concentrated on hard questions where the dense baseline fails (+28.5 points). On MuSiQue, AAR achieves +10.1 points in the transductive setting. An inductive model trained on training-split associations and evaluated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
