ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System
Hao Kang, Ziyang Li, Xinyu Yang, Weili Xu, Yinfang Chen, Junxiong Wang, Beidi Chen, Tushar Krishna, Chenfeng Xu, and Simran Arora

TL;DR
ThunderAgent is a novel inference system that optimizes resource management for large language model workflows, significantly improving throughput and memory efficiency by abstracting workflows as LLM Programs.
Contribution
It introduces a program-aware abstraction and scheduling framework that enhances resource utilization and performance in agentic LLM workflows.
Findings
Achieves 1.5-3.6x throughput improvements in serving tasks.
Attains 1.8-3.9x speedup in RL rollout scenarios.
Saves up to 4.2x disk memory compared to existing systems.
Abstract
Large language models(LLMs) are now used to power complex multi-turn agentic workflows. Existing systems run agentic inference by loosely assembling isolated components: an LLM inference engine (e.g., vLLM) and a tool orchestrator (e.g., Kubernetes). Although agentic workflows involve multiple LLM and tool requests, these systems schedule and allocate resources separately on a per-request basis, without end-to-end knowledge of the workflow. This leads to sub-optimal management of KV cache and tool execution environments. To address the challenges, we propose ThunderAgent, a fast, simple, and program-aware agentic inference system. We first abstract agentic workflows as LLM Programs, enabling a unified view of heterogeneous resources, including KV caches, system states, and external tool assets such as disk memory and network ports. Built upon this abstraction, ThunderAgent introduces a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Big Data and Digital Economy · Scientific Computing and Data Management
