Generative Reasoning Re-ranker

Mingfu Liang; Yufei Li; Jay Xu; Kavosh Asadi; Xi Liu; Shuo Gu; Kaushik Rangadurai; Frank Shyu; Shuaiwen Wang; Song Yang; Zhijing Li; Jiang Liu; Mengying Sun; Fei Tian; Xiaohan Wei; Chonglin Sun; Jacob Tao; Shike Mei; Wenlin Chen; Santanu Kolay; Sandeep Pandey; Hamed Firooz; Luke Simon

arXiv:2602.07774·cs.IR·February 24, 2026

Generative Reasoning Re-ranker

Mingfu Liang, Yufei Li, Jay Xu, Kavosh Asadi, Xi Liu, Shuo Gu, Kaushik Rangadurai, Frank Shyu, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Jacob Tao, Shike Mei, Wenlin Chen, Santanu Kolay, Sandeep Pandey, Hamed Firooz

PDF

Open Access

TL;DR

This paper introduces GR2, an end-to-end generative reasoning reranker for recommendation systems that leverages large language models, semantic ID encoding, and reinforcement learning to improve ranking accuracy and scalability.

Contribution

The paper presents a novel three-stage training pipeline for LLM-based reranking, including semantic ID encoding, reasoning trace generation, and RL supervision with verifiable rewards.

Findings

01

GR2 outperforms state-of-the-art in recall and NDCG metrics.

02

Reasoning traces significantly improve reranking performance.

03

Verifiable reward design mitigates reward hacking in RL.

Abstract

Recent studies increasingly explore Large Language Models (LLMs) as a new paradigm for recommendation systems due to their scalability and world knowledge. However, existing work has three key limitations: (1) most efforts focus on retrieval and ranking, while the reranking phase, critical for refining final recommendations, is largely overlooked; (2) LLMs are typically used in zero-shot or supervised fine-tuning settings, leaving their reasoning abilities, especially those enhanced through reinforcement learning (RL) and high-quality reasoning data, underexploited; (3) items are commonly represented by non-semantic IDs, creating major scalability challenges in industrial systems with billions of identifiers. To address these gaps, we propose the Generative Reasoning Reranker (GR2), an end-to-end framework with a three-stage training pipeline tailored for reranking. First, a pretrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks