GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction

Zhijie Lin; Zhuofeng Li; Chenglei Dai; Wentian Bao; Shuai Lin; Enyun Yu; Haoxiang Zhang; Liang Zhao

arXiv:2510.25220·cs.IR·October 30, 2025

GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction

Zhijie Lin, Zhuofeng Li, Chenglei Dai, Wentian Bao, Shuai Lin, Enyun Yu, Haoxiang Zhang, Liang Zhao

PDF

TL;DR

GReF introduces a unified, efficient autoregressive reranking framework that improves recommendation quality and inference speed, enabling real-time deployment in large-scale systems.

Contribution

The paper presents GReF, a novel unified generative reranking framework with a bidirectional encoder and multi-token prediction, enabling end-to-end training and efficient inference.

Findings

01

Outperforms state-of-the-art reranking methods in offline experiments.

02

Achieves near real-time latency comparable to non-autoregressive models.

03

Significantly improves online recommendation quality in a large-scale deployment.

Abstract

In a multi-stage recommendation system, reranking plays a crucial role in modeling intra-list correlations among items. A key challenge lies in exploring optimal sequences within the combinatorial space of permutations. Recent research follows a two-stage (generator-evaluator) paradigm, where a generator produces multiple feasible sequences, and an evaluator selects the best one. In practice, the generator is typically implemented as an autoregressive model. However, these two-stage methods face two main challenges. First, the separation of the generator and evaluator hinders end-to-end training. Second, autoregressive generators suffer from inference efficiency. In this work, we propose a Unified Generative Efficient Reranking Framework (GReF) to address the two primary challenges. Specifically, we introduce Gen-Reranker, an autoregressive generator featuring a bidirectional encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.