GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

Shangjian Yin; Yu Fu; Yue Dong; Zhouxing Shi

arXiv:2605.15464·cs.LG·May 18, 2026

GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

Shangjian Yin, Yu Fu, Yue Dong, Zhouxing Shi

PDF

1 Repo

TL;DR

GRLO demonstrates that a small amount of RLHF in open-ended environments can significantly enhance language models' generalization, reducing training costs while maintaining competitive performance.

Contribution

This work shows that limited RLHF training in open-ended environments can transfer conversational skills to downstream tasks, reducing data and compute needs.

Findings

01

GRLO improves average performance from 24.1 to 63.1 with minimal prompts and compute.

02

It requires 46 times less data and 68 times less compute than in-domain RLVR.

03

A subsequent RLVR stage offers limited additional gains mainly on complex benchmarks.

Abstract

Post-training has become a crucial step for unlocking the capabilities of large language models, with reinforcement learning (RL) emerging as a critical paradigm. Recent RL-based post-training has increasingly split into two paradigms: reinforcement learning from human feedback (RLHF), which optimizes models using human preference signals in target domains, and reinforcement learning from verifiable rewards (RLVR), which operates in verifier-backed environments. The latter has dominated recent reasoning-oriented post-training because it delivers stronger gains and higher efficiency on domain-specific tasks (e.g., reasoning). However, although in-domain RL training achieves promising performance, it still requires a substantial amount of GPU compute, which remains a major barrier to broad adoption. In this work, we study the generalization ability of RLHF learned from scratch from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SJY8460/GRLO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.