UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection

Yang Zhao; Kai Xiong; Xiao Ding; Li Du; YangouOuyang; Zhouhao Sun; Jiannan Guan; Wenbin Zhang; Bin Liu; Dong Hu; Bing Qin; Ting Liu

arXiv:2505.12457·cs.LG·May 20, 2025

UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection

Yang Zhao, Kai Xiong, Xiao Ding, Li Du, YangouOuyang, Zhouhao Sun, Jiannan Guan, Wenbin Zhang, Bin Liu, Dong Hu, Bing Qin, Ting Liu

PDF

Open Access

TL;DR

UFO-RL introduces an efficient uncertainty-based data selection method for reinforcement learning with LLMs, significantly reducing training time while maintaining or improving performance by focusing on data within the model's potential comprehension zone.

Contribution

The paper presents UFO-RL, a novel framework that uses single-pass uncertainty estimation for data selection, enabling faster and more effective RL training of LLMs.

Findings

01

Up to 185x faster data evaluation.

02

Training with 10% of selected data matches or exceeds full-data performance.

03

Reduces training time by up to 16x while improving stability.

Abstract

Scaling RL for LLMs is computationally expensive, largely due to multi-sampling for policy optimization and evaluation, making efficient data selection crucial. Inspired by the Zone of Proximal Development (ZPD) theory, we hypothesize LLMs learn best from data within their potential comprehension zone. Addressing the limitation of conventional, computationally intensive multi-sampling methods for data assessment, we introduce UFO-RL. This novel framework uses a computationally efficient single-pass uncertainty estimation to identify informative data instances, achieving up to 185x faster data evaluation. UFO-RL leverages this metric to select data within the estimated ZPD for training. Experiments show that training with just 10% of data selected by UFO-RL yields performance comparable to or surpassing full-data training, reducing overall training time by up to 16x while enhancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)