Context Bootstrapped Reinforcement Learning

Saaket Agashe; Jayanth Srinivasa; Gaowen Liu; Ramana Kompella; Xin Eric Wang

arXiv:2603.18953·cs.LG·March 20, 2026

Context Bootstrapped Reinforcement Learning

Saaket Agashe, Jayanth Srinivasa, Gaowen Liu, Ramana Kompella, Xin Eric Wang

PDF

Open Access

TL;DR

This paper introduces Context Bootstrapped Reinforcement Learning (CBRL), a method that improves exploration and learning efficiency in reinforcement learning by stochastically adding demonstrations during training, leading to better reasoning and domain adaptation.

Contribution

CBRL is a novel approach that enhances reinforcement learning from verifiable rewards by curriculum-based demonstration injection, improving exploration and reasoning capabilities.

Findings

01

CBRL consistently improves success rates across multiple tasks.

02

CBRL enhances exploration efficiency in reinforcement learning.

03

CBRL is effective across different model architectures.

Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge is particularly severe for tasks that require the acquisition of novel reasoning patterns or domain-specific knowledge. To address this, we propose Context Bootstrapped Reinforcement Learning (CBRL), which augments RLVR training by stochastically prepending few-shot demonstrations to training prompts. The injection probability follows a curriculum that starts high to bootstrap early exploration, then anneals to zero so the model must ultimately succeed without assistance. This forces the policy to internalize reasoning patterns from the demonstrations rather than relying on them at test time. We validate CBRL across two model families and five Reasoning Gym tasks. Our results demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling