Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning

Bangji Yang; Hongbo Ma; Jiajun Fan; Ge Liu

arXiv:2604.02322·cs.LG·April 3, 2026

Batched Contextual Reinforcement: A Task-Scaling Law for Efficient Reasoning

Bangji Yang, Hongbo Ma, Jiajun Fan, Ge Liu

PDF

TL;DR

This paper introduces Batched Contextual Reinforcement, a training method enabling large language models to solve multiple problems simultaneously within a shared context, reducing token usage and improving efficiency without sacrificing accuracy.

Contribution

The paper presents a novel single-stage training paradigm that leverages batching to improve reasoning efficiency and establish a task-scaling law in LLMs.

Findings

01

Token usage per problem decreases as the number of concurrent problems increases.

02

BCR reduces token consumption by 15.8% to 62.6% while maintaining or improving accuracy.

03

Models exhibit emergent self-regulated efficiency, eliminating redundant reasoning loops.

Abstract

Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either degrade reasoning quality or require complex training pipelines. We introduce Batched Contextual Reinforcement, a minimalist, single-stage training paradigm that unlocks efficient reasoning through a simple structural modification: training the model to solve N problems simultaneously within a shared context window, rewarded purely by per-instance accuracy. This formulation creates an implicit token budget that yields several key findings: (1) We identify a novel task-scaling law: as the number of concurrent problems N increases during inference, per-problem token usage decreases monotonically while accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.