Reinforcement Speculative Decoding for Fast Ranking

Yingpeng Du; Tianjun Wei; Zhu Sun; Jie Zhang

arXiv:2505.20316·cs.AI·May 28, 2025

Reinforcement Speculative Decoding for Fast Ranking

Yingpeng Du, Tianjun Wei, Zhu Sun, Jie Zhang

PDF

Open Access

TL;DR

This paper introduces Reinforcement Speculative Decoding, a novel method that uses reinforcement learning to improve the speed and accuracy of large language models in ranking tasks under strict latency constraints.

Contribution

It proposes an up-to-down decoding paradigm with a reinforcement learning-based agent that iteratively modifies rankings, leveraging listwise knowledge for faster inference in ranking systems.

Findings

01

Significant speedup in ranking inference tasks.

02

Maintains high ranking accuracy with reduced latency.

03

Effective in both information retrieval and recommender systems.

Abstract

Large Language Models (LLMs) have been widely adopted in ranking systems such as information retrieval (IR) systems and recommender systems (RSs). To alleviate the latency of auto-regressive decoding, some studies explore the single (first) token decoding for ranking approximation, but they suffer from severe degradation in tail positions. Although speculative decoding (SD) methods can be a remedy with verification at different positions, they face challenges in ranking systems due to their left-to-right decoding paradigm. Firstly, ranking systems require strict latency constraints, but verification rounds in SD methods remain agnostic; Secondly, SD methods usually discard listwise ranking knowledge about unaccepted items in previous rounds, hindering future multi-token prediction, especially when candidate tokens are the unaccepted items. In this paper, we propose a Reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Algebra and Logic · Bayesian Modeling and Causal Inference · Logic, Reasoning, and Knowledge