Speculative Interaction Agents: Building Real-Time Agents with Asynchronous I/O and Speculative Tool Calling
Coleman Hooper, Minwoo Kang, Suhong Moon, Nicholas Lee, Eric Wen, John Wawrzynek, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami, Kurt Keutzer

TL;DR
This paper introduces Speculative Interaction Agents that leverage asynchronous I/O and speculative tool calling to enable real-time, low-latency interactions with complex AI agents, significantly improving speed with minimal accuracy loss.
Contribution
The paper presents novel methods for asynchronous processing and speculative tool calling to achieve real-time performance in agentic AI systems.
Findings
Achieves 1.3-1.7× speedups on cloud APIs with minor accuracy loss.
Demonstrates 1.6-2.2× speedups on small models across multiple benchmarks.
Proposes a clock-based training methodology for streaming inputs in edge models.
Abstract
There is a growing demand for agentic AI technologies for a range of downstream applications like customer service and personal assistants. For applications where the agent needs to interact with a person, real-time low-latency responsiveness is required; for example, with voice-controlled applications, under 1 second of latency is typically required for the interaction to feel seamless. However, if we want the LLM to reason and execute an agentic workflow with tool calling, this can add several seconds or more of latency, which is prohibitive for real-time latency-sensitive applications. In our work, we propose Speculative Interaction Agents to enable real-time interaction even for agents with complex multi-turn tool calling. We propose Asynchronous I/O, which decouples the core agent reason-and-act thread from waiting for additional information from either the user or environment,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
