Gumbel Reranking: Differentiable End-to-End Reranker Optimization

Siyuan Huang; Zhiyuan Ma; Jintao Du; Changhua Meng; Weiqiang Wang; Jingwen Leng; Minyi Guo; Zhouhan Lin

arXiv:2502.11116·cs.CL·June 10, 2025

Gumbel Reranking: Differentiable End-to-End Reranker Optimization

Siyuan Huang, Zhiyuan Ma, Jintao Du, Changhua Meng, Weiqiang Wang, Jingwen Leng, Minyi Guo, Zhouhan Lin

PDF

Open Access 1 Video

TL;DR

This paper introduces Gumbel Reranking, a novel end-to-end training framework for rerankers that uses the Gumbel Trick to optimize document relevance ranking, improving recall in retrieval tasks.

Contribution

It reframes reranking as an attention-mask problem and proposes a differentiable method for end-to-end reranker optimization using stochastic Top-$k$ attention masks.

Findings

01

Achieves a 10.4% recall improvement on HotpotQA.

02

Demonstrates consistent performance gains across various settings.

03

Addresses training-inference misalignment issues in reranker training.

Abstract

RAG systems rely on rerankers to identify relevant documents. However, fine-tuning these models remains challenging due to the scarcity of annotated query-document pairs. Existing distillation-based approaches suffer from training-inference misalignment and fail to capture interdependencies among candidate documents. To overcome these limitations, we reframe the reranking process as an attention-mask problem and propose Gumbel Reranking, an end-to-end training framework for rerankers aimed at minimizing the training-inference gap. In our approach, reranker optimization is reformulated as learning a stochastic, document-wise Top- $k$ attention mask using the Gumbel Trick and Relaxed Top- $k$ Sampling. This formulation enables end-to-end optimization by minimizing the overall language loss. Experiments across various settings consistently demonstrate performance gains, including a 10.4\%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Gumbel Reranking: Differentiable End-to-End Reranker Optimization· underline

Taxonomy

TopicsRobotic Path Planning Algorithms

MethodsSoftmax · Attention Is All You Need