TL;DR
This paper introduces a novel approach that applies the Rational Speech Acts framework to sparse neural information retrieval, enhancing document representation by considering dataset-wide interactions and improving retrieval performance.
Contribution
It adapts the RSA framework to IR, enabling dynamic modulation of token-document interactions based on dataset context, leading to improved retrieval accuracy.
Findings
Consistent improvement across multiple sparse retrieval models
Achieves state-of-the-art results on BEIR benchmark datasets
Effectively models complex term interactions in document representations
Abstract
Current sparse neural information retrieval (IR) methods, and to a lesser extent more traditional models such as BM25, do not take into account the document collection and the complex interplay between different term weights when representing a single document. In this paper, we show how the Rational Speech Acts (RSA), a linguistics framework used to minimize the number of features to be communicated when identifying an object in a set, can be adapted to the IR case -- and in particular to the high number of potential features (here, tokens). RSA dynamically modulates token-document interactions by considering the influence of other documents in the dataset, better contrasting document representations. Experiments show that incorporating RSA consistently improves multiple sparse retrieval models and achieves state-of-the-art performance on out-of-domain datasets from the BEIR benchmark.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
