Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

Xuan Du; Qiangyu Yan; Wenshuo Li; Borui Jiang; Changming Xiao; Han Shu; Xinghao Chen

arXiv:2605.20946·cs.CL·May 21, 2026

Thinking-while-speaking: A Controlled, Interleaved Reasoning Method for Real-Time Speech Generation

Xuan Du, Qiangyu Yan, Wenshuo Li, Borui Jiang, Changming Xiao, Han Shu, Xinghao Chen

PDF

TL;DR

This paper introduces InterRS, a novel method for real-time speech generation that interleaves reasoning steps with speech, improving fluency and reasoning accuracy in AI communication.

Contribution

The paper presents a new pipeline for generating interleaved reasoning and speech data, along with training techniques that enhance naturalness and reasoning performance in speech generation.

Findings

01

Achieves 13% better performance on mathematical and logic benchmarks.

02

Generates instant, fluent responses comparable to spoken-language models.

03

Produces more natural and fluent answers than prior methods.

Abstract

The thinking-while-speaking paradigm aims to make AI communication more human. A key challenge is maintaining fluent speech while performing deep reasoning. Our method, InterRS, tackles this by inserting reasoning steps only during natural speech generation. This requires high-quality data where reasoning and speech are precisely aligned, and the length ratio are under controlled. We introduce a novel pipeline to generate such seamlessly interleaved audio data. To train our model, we combine interleaved SFT with refined data and reinforcement learning with two new rewards: a TA-Balance Reward to manage timing and thinking-answer ratio, and a Linguistic Quality Reward to refine expression. Experiments show our approach achieves 13% better performance on mathmatical and logic benchmarks while generating instant response like a spoken-language instruct model which outputs fast CoT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.