Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents
Hongqiu Ni, Jiabao Zhang, Guopeng Li, Zilong Wang, Ruiqi Wu, Chi Zhang, Haisheng Tan

TL;DR
Astraea is a state-aware scheduling engine that optimizes the entire lifecycle of LLM-powered agent workflows, significantly reducing latency and improving robustness through hierarchical scheduling and adaptive caching.
Contribution
It introduces a global, state-aware scheduling approach for LLM agents that outperforms existing local-segment optimization systems in reducing end-to-end latency.
Findings
Reduces average Job Completion Time by up to 25.5%.
Demonstrates robustness under high load conditions.
Effectively balances efficiency and fairness in scheduling.
Abstract
Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network services like Web APIs, introduce a mismatch in their execution pattern and the scheduling granularity of existing inference systems such as vLLM. Existing systems typically focus on per-segment optimization which prevents them from minimizing the end-to-end latency of the complete agentic workflow, i.e., the global Job Completion Time (JCT) over the entire request lifecycle. To address this limitation, we propose Astraea, a service engine designed to shift the optimization from local segments to the global request lifecycle. Astraea employs a state-aware, hierarchical scheduling algorithm that integrates a request's historical state with future predictions. It dynamically classifies requests by their I/O…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Big Data and Digital Economy · IoT and Edge/Fog Computing
