Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

Guangyu Feng; Huanzhi Mao; Prabal Dutta; Joseph E. Gonzalez

arXiv:2605.15077·cs.CL·May 15, 2026

Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

Guangyu Feng, Huanzhi Mao, Prabal Dutta, Joseph E. Gonzalez

PDF

TL;DR

AsyncFC introduces an execution-layer framework that enables asynchronous function calling in LLMs, reducing latency by overlapping decoding and execution without model modifications.

Contribution

It presents a novel framework that decouples decoding from function execution, allowing overlap and parallelism, improving efficiency without changing existing models or protocols.

Findings

01

AsyncFC significantly reduces end-to-end task completion time.

02

AsyncFC preserves task accuracy while improving efficiency.

03

LLMs can reason over symbolic futures for asynchronous interactions.

Abstract

Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.