A Dense Representation Framework for Lexical and Semantic Matching

Sheng-Chieh Lin; Jimmy Lin

arXiv:2206.09912·cs.IR·February 28, 2023·1 cites

A Dense Representation Framework for Lexical and Semantic Matching

Sheng-Chieh Lin, Jimmy Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dense representation framework that combines lexical and semantic matching for more efficient and effective text retrieval, achieving competitive results with faster speed and smaller indexes.

Contribution

It proposes densifying lexical representations into low-dimensional dense lexical representations and combining them with semantic representations to improve retrieval speed and effectiveness.

Findings

01

DLRs effectively approximate original lexical representations

02

DHRs outperform existing hybrid techniques in speed and accuracy

03

The model is competitive with state-of-the-art retrievers in various settings

Abstract

Lexical and semantic matching capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust than either alone. Prior work performs hybrid retrieval by conducting lexical and semantic matching using different systems (e.g., Lucene and Faiss, respectively) and then fusing their model outputs. In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs). Our experiments show that DLRs can effectively approximate the original lexical representations, preserving effectiveness while improving query latency. Furthermore, we can combine dense lexical and semantic representations to generate dense hybrid representations (DHRs) that are more flexible and yield faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

castorini/dhr
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning