Efficient Listwise Reranking with Compressed Document Representations
Herv\'e D\'ejean, St\'ephane Clinchant

TL;DR
This paper introduces RRK, a listwise reranker that compresses documents into fixed-size embeddings, enabling faster and more effective reranking, especially for long documents, with minimal performance loss.
Contribution
The paper proposes a novel document compression method for reranking that improves efficiency and effectiveness, particularly on long-document benchmarks, using a simple distillation training approach.
Findings
RRK runs 3x-18x faster than smaller rerankers.
RRK matches or outperforms smaller rerankers in effectiveness.
Efficiency gains are greater on long-document benchmarks.
Abstract
Reranking, the process of refining the output from a first-stage retriever, is often considered computationally expensive, especially when using Large Language Models (LLMs). A common approach to mitigate this cost involves utilizing smaller LLMs or controlling input length. Inspired by recent advances in document compression for retrieval-augmented generation (RAG), we introduce RRK, an efficient and effective listwise reranker compressing documents into multi-token fixed-size embedding representations. Our simple training via distillation shows that this combination of rich compressed representations and listwise reranking yields a highly efficient and effective system. In particular, our 8B-parameter model runs 3x-18x faster than smaller rerankers (0.6-4B parameters) while matching or outperforming them in effectiveness. The efficiency gains are even more striking on long-document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
