RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

Andrew Choi; Wei Xu

arXiv:2605.11151·cs.AI·May 21, 2026

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

Andrew Choi, Wei Xu

PDF

TL;DR

RankQ introduces a self-supervised ranking loss to improve offline-to-online reinforcement learning, enabling better policy refinement and transfer in sparse reward and vision-based robotic tasks.

Contribution

It proposes a novel ranking-based Q-learning objective that enhances value estimation by learning relative action preferences, outperforming prior methods in various benchmarks.

Findings

01

RankQ achieves state-of-the-art performance on D4RL benchmarks.

02

In robot learning, RankQ significantly improves simulation success rates.

03

RankQ enables effective sim-to-real transfer in robotic manipulation.

Abstract

Offline-to-online reinforcement learning (RL) improves sample efficiency by leveraging pre-collected datasets prior to online interaction. A key challenge, however, is learning an accurate critic in large state--action spaces with limited dataset coverage. To mitigate harmful updates from value overestimation, prior methods impose pessimism by down-weighting out-of-distribution (OOD) actions relative to dataset actions. While effective, this essentially acts as a behavior cloning anchor and can hinder downstream online policy improvement when dataset actions are suboptimal. We propose RankQ, an offline-to-online Q-learning objective that augments temporal-difference learning with a self-supervised multi-term ranking loss to enforce structured action ordering. By learning relative action preferences rather than uniformly penalizing unseen actions, RankQ shapes the Q-function such that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.