Quality and Cost Trade-offs in Passage Re-ranking Task
Pavel Podberezko, Vsevolod Mitskevich, Raman Makouski, Pavel, Goncharov, Andrei Khobnia, Nikolay Bushkov, Marina Chernyshevich

TL;DR
This paper explores how to balance the quality and computational cost of passage re-ranking in information retrieval by evaluating various transformer-based models and techniques to optimize efficiency without significantly sacrificing accuracy.
Contribution
It introduces a systematic analysis of different re-ranking architectures and methods, including binarization, to improve efficiency in real-time retrieval systems.
Findings
Binarization of transformer outputs reduces memory footprint.
Late-interaction models like Colbert and Poly-encoder perform well with optimized configurations.
Trade-offs between model size, speed, and ranking quality are quantified.
Abstract
Deep learning models named transformers achieved state-of-the-art results in a vast majority of NLP tasks at the cost of increased computational complexity and high memory consumption. Using the transformer model in real-time inference becomes a major challenge when implemented in production, because it requires expensive computational resources. The more executions of a transformer are needed the lower the overall throughput is, and switching to the smaller encoders leads to the decrease of accuracy. Our paper is devoted to the problem of how to choose the right architecture for the ranking step of the information retrieval pipeline, so that the number of required calls of transformer encoder is minimal with the maximum achievable quality of ranking. We investigated several late-interaction models such as Colbert and Poly-encoder architectures along with their modifications. Also, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Natural Language Processing Techniques
