D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang

TL;DR
D2LLM introduces a novel approach that combines the efficiency of bi-encoders with the nuanced understanding of cross-encoders through decomposition and knowledge distillation, significantly enhancing semantic search performance.
Contribution
The paper proposes a decomposed and distilled LLM framework that balances accuracy and efficiency in semantic search, outperforming existing models across multiple tasks.
Findings
Surpasses five baseline models in all evaluated metrics.
Improves NLI task performance by at least 6.45%.
Demonstrates effective combination of bi-encoder efficiency with cross-encoder nuance.
Abstract
The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time applications. In this paper, we present D2LLMs-Decomposed and Distilled LLMs for semantic search-that combines the best of both worlds. We decompose a cross-encoder into an efficient bi-encoder integrated with Pooling by Multihead Attention and an Interaction Emulation Module, achieving nuanced understanding and pre-computability. Knowledge from the LLM is distilled into this model using contrastive, rank, and feature imitation techniques. Our experiments show that D2LLM surpasses five leading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsSoftmax · Attention Is All You Need
