Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Lei Huang, Xiang Cheng, Chenxiao Zhao, Guobin Shen, Junjie Yang, Xiaocheng Feng, Yuxuan Gu, Xing Yu, Bing Qin

TL;DR
GOLF is a reinforcement learning framework that leverages group-level natural language feedback to enhance exploration efficiency and performance, significantly outperforming traditional reward-based methods.
Contribution
This work introduces GOLF, a novel RL approach that explicitly utilizes group-level natural language feedback for targeted exploration and continuous improvement.
Findings
Achieves 2.2× sample efficiency over reward-only RL methods.
Effectively exploits external critiques and intra-group attempts.
Improves exploration in sparse-reward environments.
Abstract
Large language models (LLMs) typically receive diverse natural language (NL) feedback through interaction with the environment. However, current reinforcement learning (RL) algorithms rely solely on scalar rewards, leaving the rich information in NL feedback underutilized and leading to inefficient exploration. In this work, we propose GOLF, an RL framework that explicitly exploits group-level language feedback to guide targeted exploration through actionable refinements. GOLF aggregates two complementary feedback sources: (i) external critiques that pinpoint errors or propose targeted fixes, and (ii) intra-group attempts that supply alternative partial ideas and diverse failure patterns. These group-level feedbacks are aggregated to produce high-quality refinements, which are adaptively injected into training as off-policy scaffolds to provide targeted guidance in sparse-reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
