Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution
Yifan Sui, Han Zhao, Rui Ma, Zhiyuan He, Hao Wang, Jianxun Li, Yuqing Yang

TL;DR
This paper introduces PASTE, a method that accelerates LLM agents by speculatively executing tools based on predictable patterns, significantly reducing latency and improving throughput.
Contribution
PASTE leverages stable control flows and data dependencies in agent requests to enable speculative tool execution, addressing latency bottlenecks in LLM-powered agents.
Findings
Reduces average task completion time by 48.5%
Improves tool execution throughput by 1.8x
Outperforms state-of-the-art baselines
Abstract
LLM-powered agents are emerging as a dominant paradigm for autonomous task solving. Unlike standard inference workloads, agents operate in a strictly serial "LLM-tool" loop, where the LLM must wait for external tool execution at every step. This execution model introduces severe latency bottlenecks. To address this problem, we propose PASTE, a Pattern-Aware Speculative Tool Execution method designed to hide tool latency through speculation. PASTE is based on the insight that although agent requests are semantically diverse, they exhibit stable application level control flows (recurring tool-call sequences) and predictable data dependencies (parameter passing between tools). By exploiting these properties, PASTE improves agent serving performance through speculative tool execution. Experimental results against state of the art baselines show that PASTE reduces average task completion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software System Performance and Reliability · Security and Verification in Computing
