Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs

Mengqi Liao; Xiangyu Xi; Ruinian Chen; Jia Leng; Yangen Hu; Ke Zeng; Shuai Liu; Huaiyu Wan

arXiv:2505.18573·cs.LG·October 21, 2025

Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs

Mengqi Liao, Xiangyu Xi, Ruinian Chen, Jia Leng, Yangen Hu, Ke Zeng, Shuai Liu, Huaiyu Wan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a dynamic, difficulty-aware reinforcement learning method for LLMs that improves training efficiency and maintains exploration by adaptively allocating rollouts and adjusting temperature.

Contribution

It introduces a novel mechanism for dynamic rollout allocation and adaptive temperature adjustment to enhance RL training of LLMs.

Findings

01

Improved training efficiency through difficulty-based rollout allocation.

02

Maintained exploration with adaptive temperature control.

03

Enhanced response precision without sacrificing exploratory ability.

Abstract

Reasoning large language models (LLMs) excel in complex tasks, which has drawn significant attention to reinforcement learning (RL) for LLMs. However, existing approaches allocate an equal number of rollouts to all questions during the RL process, which is inefficient. This inefficiency stems from the fact that training on simple questions yields limited gains, whereas more rollouts are needed for challenging questions to sample correct answers. Furthermore, while RL improves response precision, it limits the model's exploration ability, potentially resulting in a performance cap below that of the base model prior to RL. To address these issues, we propose a mechanism for dynamically allocating rollout budgets based on the difficulty of the problems, enabling more efficient RL training. Additionally, we introduce an adaptive dynamic temperature adjustment strategy to maintain the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liaomengqi/e3-rl4llms
pytorchOfficial

Videos

Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs· underline

Taxonomy

TopicsScheduling and Optimization Algorithms

MethodsSoftmax · Attention Is All You Need · Balanced Selection