Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs
Guangyu Feng, Huanzhi Mao, Prabal Dutta, Joseph E. Gonzalez

TL;DR
AsyncFC introduces an execution-layer framework that enables asynchronous function calling in LLMs, reducing latency by overlapping decoding and execution without model modifications.
Contribution
It presents a novel framework that decouples decoding from function execution, allowing overlap and parallelism, improving efficiency without changing existing models or protocols.
Findings
AsyncFC significantly reduces end-to-end task completion time.
AsyncFC preserves task accuracy while improving efficiency.
LLMs can reason over symbolic futures for asynchronous interactions.
Abstract
Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
