TL;DR
This paper introduces TAS-Balanced, a resource-efficient training method for dense retrieval models that achieves state-of-the-art low-latency results using only a single GPU, significantly reducing training costs while improving retrieval performance.
Contribution
The paper presents a novel topic-aware sampling technique and dual-teacher supervision for training dense retrievers efficiently on limited hardware, outperforming existing methods.
Findings
Achieves state-of-the-art low-latency retrieval results
Outperforms BM25 and previous dense models on TREC-DL datasets
Operates effectively on a single consumer-grade GPU
Abstract
A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows. The neural IR community made great advancements in training effective dual-encoder dense retrieval (DR) models recently. A dense text retrieval model uses a single vector representation per query and passage to score a match, which enables low-latency first stage retrieval with a nearest neighbor search. Increasingly common, training approaches require enormous compute power, as they either conduct negative passage sampling out of a continuously updating refreshing index or require very large batch sizes for in-batch negative sampling. Instead of relying on more compute capability, we introduce an efficient topic-aware query and balanced margin sampling technique, called TAS-Balanced. We cluster queries once before training and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
