Stop Overthinking: Unlocking Efficient Listwise Reranking with Minimal Reasoning
Danyang Liu, Kan Li

TL;DR
This paper introduces a Length-Regularized Self-Distillation method that reduces reasoning tokens in listwise reranking with LLMs, maintaining effectiveness while improving efficiency for real-time applications.
Contribution
It proposes a novel framework that synthesizes high-quality, minimal reasoning traces to train models that are both accurate and computationally efficient.
Findings
Reduces inference token usage by 34%-37% across benchmarks.
Maintains ranking effectiveness comparable to larger models.
Addresses overthinking by pruning redundant reasoning in LLM-based rerankers.
Abstract
Listwise reranking utilizing Large Language Models (LLMs) has achieved state-of-the-art retrieval effectiveness. Recently, reasoning-enhanced models have further pushed these boundaries by employing Chain-of-Thought (CoT) to perform deep comparative analysis of candidate documents. However, this performance gain comes at a prohibitive computational cost, as models often generate thousands of reasoning tokens before producing a final ranking. In this work, we investigate the relationship between reasoning length and ranking quality, revealing an overthinking phenomenon where extended reasoning yields diminishing returns. To address this, we propose a Length-Regularized Self-Distillation framework. We synthesize a dataset by sampling diverse reasoning traces from a teacher model (Rank-K) and applying a Pareto-inspired filter to select traces that achieve high ranking performance with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
