When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You

TL;DR
This paper introduces Side-by-Side (SxS) Interleaved Reasoning, a method allowing LLMs to control when to disclose reasoning steps, improving accuracy and efficiency in reasoning tasks.
Contribution
It proposes a novel interleaved reasoning approach that enables controllable disclosure timing within autoregressive models, enhancing reasoning performance.
Findings
SxS improves accuracy across multiple architectures and scales.
SxS achieves better content-latency trade-offs in benchmarks.
Training with entailment-aligned trajectories enhances reasoning quality.
Abstract
In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments that bias subsequent generations. We introduce Side-by-Side (SxS) Interleaved Reasoning, which makes disclosure timing a controllable decision within standard autoregressive generation. SxS interleaves partial disclosures with continued private reasoning in the same context, but releases content only when it is supported by the reasoning so far. To learn such pacing without incentivizing filler, we construct entailment-aligned interleaved trajectories by matching answer prefixes to supporting reasoning prefixes, then train with SFT to acquire the dual-action semantics and RL to recover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
