GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction
Zhijie Lin, Zhuofeng Li, Chenglei Dai, Wentian Bao, Shuai Lin, Enyun Yu, Haoxiang Zhang, Liang Zhao

TL;DR
GReF introduces a unified, efficient autoregressive reranking framework that improves recommendation quality and inference speed, enabling real-time deployment in large-scale systems.
Contribution
The paper presents GReF, a novel unified generative reranking framework with a bidirectional encoder and multi-token prediction, enabling end-to-end training and efficient inference.
Findings
Outperforms state-of-the-art reranking methods in offline experiments.
Achieves near real-time latency comparable to non-autoregressive models.
Significantly improves online recommendation quality in a large-scale deployment.
Abstract
In a multi-stage recommendation system, reranking plays a crucial role in modeling intra-list correlations among items. A key challenge lies in exploring optimal sequences within the combinatorial space of permutations. Recent research follows a two-stage (generator-evaluator) paradigm, where a generator produces multiple feasible sequences, and an evaluator selects the best one. In practice, the generator is typically implemented as an autoregressive model. However, these two-stage methods face two main challenges. First, the separation of the generator and evaluator hinders end-to-end training. Second, autoregressive generators suffer from inference efficiency. In this work, we propose a Unified Generative Efficient Reranking Framework (GReF) to address the two primary challenges. Specifically, we introduce Gen-Reranker, an autoregressive generator featuring a bidirectional encoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
