A Dense Representation Framework for Lexical and Semantic Matching
Sheng-Chieh Lin, Jimmy Lin

TL;DR
This paper introduces a dense representation framework that combines lexical and semantic matching for more efficient and effective text retrieval, achieving competitive results with faster speed and smaller indexes.
Contribution
It proposes densifying lexical representations into low-dimensional dense lexical representations and combining them with semantic representations to improve retrieval speed and effectiveness.
Findings
DLRs effectively approximate original lexical representations
DHRs outperform existing hybrid techniques in speed and accuracy
The model is competitive with state-of-the-art retrievers in various settings
Abstract
Lexical and semantic matching capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust than either alone. Prior work performs hybrid retrieval by conducting lexical and semantic matching using different systems (e.g., Lucene and Faiss, respectively) and then fusing their model outputs. In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs). Our experiments show that DLRs can effectively approximate the original lexical representations, preserving effectiveness while improving query latency. Furthermore, we can combine dense lexical and semantic representations to generate dense hybrid representations (DHRs) that are more flexible and yield faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
