Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution

Yifan Sui; Han Zhao; Rui Ma; Zhiyuan He; Hao Wang; Jianxun Li; Yuqing Yang

arXiv:2603.18897·cs.DC·March 20, 2026

Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution

Yifan Sui, Han Zhao, Rui Ma, Zhiyuan He, Hao Wang, Jianxun Li, Yuqing Yang

PDF

Open Access

TL;DR

This paper introduces PASTE, a method that accelerates LLM agents by speculatively executing tools based on predictable patterns, significantly reducing latency and improving throughput.

Contribution

PASTE leverages stable control flows and data dependencies in agent requests to enable speculative tool execution, addressing latency bottlenecks in LLM-powered agents.

Findings

01

Reduces average task completion time by 48.5%

02

Improves tool execution throughput by 1.8x

03

Outperforms state-of-the-art baselines

Abstract

LLM-powered agents are emerging as a dominant paradigm for autonomous task solving. Unlike standard inference workloads, agents operate in a strictly serial "LLM-tool" loop, where the LLM must wait for external tool execution at every step. This execution model introduces severe latency bottlenecks. To address this problem, we propose PASTE, a Pattern-Aware Speculative Tool Execution method designed to hide tool latency through speculation. PASTE is based on the insight that although agent requests are semantically diverse, they exhibit stable application level control flows (recurring tool-call sequences) and predictable data dependencies (parameter passing between tools). By exploiting these properties, PASTE improves agent serving performance through speculative tool execution. Experimental results against state of the art baselines show that PASTE reduces average task completion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Software System Performance and Reliability · Security and Verification in Computing