Asynchronous Reasoning: Training-Free Interactive Thinking LLMs
George Yakushev, Nataliia Babina, Masoud Vahid Dastgerdi, Vyacheslav Zhdanovskiy, Denis Kuznedelev, Alina Shutova, Max Ryabinin

TL;DR
This paper introduces a training-free method for LLMs to perform asynchronous reasoning, enabling real-time, interactive responses by leveraging positional embeddings, significantly reducing response delays.
Contribution
It proposes a novel approach that allows reasoning-capable LLMs to operate asynchronously without additional training, mimicking human-like multitasking during interactions.
Findings
Reduces time to first non-thinking token from minutes to 5 seconds or less.
Achieves up to 12 times reduction in overall response delays.
Maintains accurate reasoning in math, commonsense, and safety tasks.
Abstract
Many state-of-the-art LLMs are trained to think before giving their answer. Reasoning can greatly improve language model capabilities, but it also makes them less interactive: given a new input, a model must stop thinking before it can respond. Real-world use cases such as voice-based or embodied assistants require an LLM agent to respond and adapt to additional information in real time, which is incompatible with sequential interactions. In contrast, humans can listen, think, and act asynchronously: we begin thinking about the problem while reading it and continue thinking while formulating the answer. In this work, we augment LLMs capable of reasoning to operate in a similar way without additional training. Our method uses the properties of positional embeddings to enable LLMs built for sequential generation to simultaneously think, listen, and write outputs. We evaluate our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
