Optimizing Agentic Language Model Inference via Speculative Tool Calls

Daniel Nichols; Prajwal Singhania; Charles Jekel; Abhinav Bhatele; Harshitha Menon

arXiv:2512.15834·cs.PL·December 19, 2025

Optimizing Agentic Language Model Inference via Speculative Tool Calls

Daniel Nichols, Prajwal Singhania, Charles Jekel, Abhinav Bhatele, Harshitha Menon

PDF

Open Access

TL;DR

This paper presents system optimizations for language model inference that speculate tool calls and keep sequences resident, significantly improving throughput for agentic language models using external tools.

Contribution

It introduces novel speculation-based optimizations and a tool cache API to reduce inference bottlenecks in tool-using language models.

Findings

01

Throughput improved by several hundred tokens per second

02

Theoretical analysis guides optimal speculation configurations

03

Proposed API facilitates adoption of optimizations

Abstract

Language models (LMs) are becoming increasingly dependent on external tools. LM-based agentic frameworks frequently interact with their environment via such tools to search files, run code, call APIs, etc. Further, modern reasoning-based LMs use tools such as web search and Python code execution to enhance their reasoning capabilities. While tools greatly improve the capabilities of LMs, they also introduce performance bottlenecks during the inference process. In this paper, we introduce novel systems optimizations to address such performance bottlenecks by speculating tool calls and forcing sequences to remain resident in the inference engine to minimize overheads. Our optimizations lead to throughput improvements of several hundred tokens per second when hosting inference for LM agents. We provide a theoretical analysis of our algorithms to provide insights into speculation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Natural Language Processing Techniques · Big Data and Digital Economy