Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With   Configurable Depth and Width

Zheng Liu; Chaofan Li; Shitao Xiao; Chaozhuo Li; Defu Lian; Yingxia; Shao

arXiv:2501.16302·cs.CL·January 28, 2025

Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width

Zheng Liu, Chaofan Li, Shitao Xiao, Chaozhuo Li, Defu Lian, Yingxia, Shao

PDF

Open Access

TL;DR

The paper introduces Matryoshka Re-Ranker, a flexible, configurable architecture for LLM-based re-ranking that adapts to various scenarios while maintaining high performance through innovative optimization techniques.

Contribution

It presents a novel flexible re-ranking architecture with runtime customization and introduces techniques like cascaded self-distillation and low-rank adaptation to mitigate precision loss.

Findings

01

Outperforms existing re-ranking methods on MSMARCO and BEIR datasets.

02

Maintains high accuracy across various compression levels and scenarios.

03

Demonstrates effective trade-off between flexibility and precision.

Abstract

Large language models (LLMs) provide powerful foundations to perform fine-grained text re-ranking. However, they are often prohibitive in reality due to constraints on computation bandwidth. In this work, we propose a \textbf{flexible} architecture called \textbf{Matroyshka Re-Ranker}, which is designed to facilitate \textbf{runtime customization} of model layers and sequence lengths at each layer based on users' configurations. Consequently, the LLM-based re-rankers can be made applicable across various real-world situations. The increased flexibility may come at the cost of precision loss. To address this problem, we introduce a suite of techniques to optimize the performance. First, we propose \textbf{cascaded self-distillation}, where each sub-architecture learns to preserve a precise re-ranking performance from its super components, whose predictions can be exploited as smooth and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization