ScalingNote: Scaling up Retrievers with Large Language Models for Real-World Dense Retrieval
Suyuan Huang, Chao Zhang, Yuanyuan Wu, Haoxin Zhang, Yuan Wang, Maolin, Wang, Shaosheng Cao, Tong Xu, Xiangyu Zhao, Zengchang Qin, Yan Gao, Yunhan, Bai, Jun Fan, Yao Hu, and Enhong Chen

TL;DR
ScalingNote introduces a two-stage approach that leverages large language models to improve dense retrieval performance while controlling online latency, validated through theoretical and empirical results in industrial settings.
Contribution
The paper presents a novel two-stage method for scaling dense retrieval with LLMs, combining training and distillation to balance performance and online efficiency.
Findings
Outperforms end-to-end models in relevance and efficiency
Verifies the scaling law of dense retrieval with LLMs in industry
Enables cost-effective, scalable dense retrieval systems
Abstract
Dense retrieval in most industries employs dual-tower architectures to retrieve query-relevant documents. Due to online deployment requirements, existing real-world dense retrieval systems mainly enhance performance by designing negative sampling strategies, overlooking the advantages of scaling up. Recently, Large Language Models (LLMs) have exhibited superior performance that can be leveraged for scaling up dense retrieval. However, scaling up retrieval models significantly increases online query latency. To address this challenge, we propose ScalingNote, a two-stage method to exploit the scaling potential of LLMs for retrieval while maintaining online query latency. The first stage is training dual towers, both initialized from the same LLM, to unlock the potential of LLMs for dense retrieval. Then, we distill only the query tower using mean squared error loss and cosine similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
