D2LLM: Decomposed and Distilled Large Language Models for Semantic   Search

Zihan Liao; Hang Yu; Jianguo Li; Jun Wang; Wei Zhang

arXiv:2406.17262·cs.CL·June 26, 2024

D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang

PDF

Open Access 1 Repo

TL;DR

D2LLM introduces a novel approach that combines the efficiency of bi-encoders with the nuanced understanding of cross-encoders through decomposition and knowledge distillation, significantly enhancing semantic search performance.

Contribution

The paper proposes a decomposed and distilled LLM framework that balances accuracy and efficiency in semantic search, outperforming existing models across multiple tasks.

Findings

01

Surpasses five baseline models in all evaluated metrics.

02

Improves NLI task performance by at least 6.45%.

03

Demonstrates effective combination of bi-encoder efficiency with cross-encoder nuance.

Abstract

The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time applications. In this paper, we present D2LLMs-Decomposed and Distilled LLMs for semantic search-that combines the best of both worlds. We decompose a cross-encoder into an efficient bi-encoder integrated with Pooling by Multihead Attention and an Interaction Emulation Module, achieving nuanced understanding and pre-computability. Knowledge from the LLM is distilled into this model using contrastive, rank, and feature imitation techniques. Our experiments show that D2LLM surpasses five leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codefuse-ai/d2llm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsSoftmax · Attention Is All You Need