Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning
Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu

TL;DR
This paper introduces Batched Contextual Reinforcement, a training method enabling large language models to solve multiple problems simultaneously within a shared context, reducing token usage and improving efficiency without sacrificing accuracy.
Contribution
The paper presents a novel single-stage training paradigm that leverages batching to improve reasoning efficiency and establish a task-scaling law in LLMs.
Findings
Token usage per problem decreases as the number of concurrent problems increases.
BCR reduces token consumption by 15.8% to 62.6% while maintaining or improving accuracy.
Models exhibit emergent self-regulated efficiency, eliminating redundant reasoning loops.
Abstract
Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require complex training pipelines. We introduce Batched Contextual Reinforcement, a minimalist, single-stage training paradigm that unlocks efficient reasoning through a simple structural modification: training the model to solve N problems simultaneously within a shared context window, rewarded purely by per-instance accuracy. This formulation creates an implicit token budget that yields several key findings: (1) We identify a novel task-scaling law: as the number of concurrent problems N increases during inference, per-problem token usage decreases monotonically while accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
