Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents

Hongqiu Ni; Jiabao Zhang; Guopeng Li; Zilong Wang; Ruiqi Wu; Chi Zhang; Haisheng Tan

arXiv:2512.14142·cs.CL·December 17, 2025

Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents

Hongqiu Ni, Jiabao Zhang, Guopeng Li, Zilong Wang, Ruiqi Wu, Chi Zhang, Haisheng Tan

PDF

Open Access

TL;DR

Astraea is a state-aware scheduling engine that optimizes the entire lifecycle of LLM-powered agent workflows, significantly reducing latency and improving robustness through hierarchical scheduling and adaptive caching.

Contribution

It introduces a global, state-aware scheduling approach for LLM agents that outperforms existing local-segment optimization systems in reducing end-to-end latency.

Findings

01

Reduces average Job Completion Time by up to 25.5%.

02

Demonstrates robustness under high load conditions.

03

Effectively balances efficiency and fairness in scheduling.

Abstract

Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems such as vLLM. Existing systems typically focus on per-segment optimization which prevents them from minimizing the end-to-end latency of the complete agentic workflow, i.e., the global Job Completion Time (JCT) over the entire request lifecycle. To address this limitation, we propose Astraea, a service engine designed to shift the optimization from local segments to the global request lifecycle. Astraea employs a state-aware, hierarchical scheduling algorithm that integrates a request's historical state with future predictions. It dynamically classifies requests by their I/O…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Big Data and Digital Economy · IoT and Edge/Fog Computing