Ranking-aware Reinforcement Learning for Ordinal Ranking
Aiming Hao, Chen Zhu, Jiashu Zhu, Jiahong Wu, Xiangxiang Chu

TL;DR
This paper introduces RARL, a reinforcement learning framework that explicitly models ordinal dependencies in ranking tasks, combining regression and ranking objectives with novel reward and exploration techniques.
Contribution
The paper proposes a unified RL framework for ordinal ranking that integrates regression and Learning-to-Rank with a ranking-aware reward and Response Mutation Operations.
Findings
RARL outperforms existing methods on three benchmarks.
The ranking-aware reward improves model alignment with ordinal dependencies.
Response Mutation Operations enhance exploration and training stability.
Abstract
Ordinal regression and ranking are challenging due to inherent ordinal dependencies that conventional methods struggle to model. We propose Ranking-Aware Reinforcement Learning (RARL), a novel RL framework that explicitly learns these relationships. At its core, RARL features a unified objective that synergistically integrates regression and Learning-to-Rank (L2R), enabling mutual improvement between the two tasks. This is driven by a ranking-aware verifiable reward that jointly assesses regression precision and ranking accuracy, facilitating direct model updates via policy optimization. To further enhance training, we introduce Response Mutation Operations (RMO), which inject controlled noise to improve exploration and prevent stagnation at saddle points. The effectiveness of RARL is validated through extensive experiments on three distinct benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Ethics and Social Impacts of AI · Machine Learning and Data Classification
