Context Bootstrapped Reinforcement Learning
Saaket Agashe, Jayanth Srinivasa, Gaowen Liu, Ramana Kompella, Xin Eric Wang

TL;DR
This paper introduces Context Bootstrapped Reinforcement Learning (CBRL), a method that improves exploration and learning efficiency in reinforcement learning by stochastically adding demonstrations during training, leading to better reasoning and domain adaptation.
Contribution
CBRL is a novel approach that enhances reinforcement learning from verifiable rewards by curriculum-based demonstration injection, improving exploration and reasoning capabilities.
Findings
CBRL consistently improves success rates across multiple tasks.
CBRL enhances exploration efficiency in reinforcement learning.
CBRL is effective across different model architectures.
Abstract
Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge is particularly severe for tasks that require the acquisition of novel reasoning patterns or domain-specific knowledge. To address this, we propose Context Bootstrapped Reinforcement Learning (CBRL), which augments RLVR training by stochastically prepending few-shot demonstrations to training prompts. The injection probability follows a curriculum that starts high to bootstrap early exploration, then anneals to zero so the model must ultimately succeed without assistance. This forces the policy to internalize reasoning patterns from the demonstrations rather than relying on them at test time. We validate CBRL across two model families and five Reasoning Gym tasks. Our results demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling
