FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
Zirui Huang, Yi-Xiang Hu, Feng Wu, Xiangyang Li

TL;DR
FATE is a novel scheduler for heterogeneous LLM workflows that optimizes future execution states to reduce latency and improve performance over existing heuristics and policies.
Contribution
FATE introduces a future-state-aware scheduling approach that considers downstream effects, outperforming traditional methods in reducing latency and makespan in LLM workflows.
Findings
FATE reduces normalized makespan by 32.5% compared to RoundRobin.
FATE reduces normalized P95 latency by 32.3% over baseline.
Joint future-state preservation improves scheduling efficiency.
Abstract
Large language model (LLM) applications are increasingly executed as heterogeneous multi-stage workflows rather than isolated inference calls. In these workflow directed acyclic graphs (DAGs), scheduling decisions affect not only the currently ready stage, but also the execution state inherited by downstream stages, including model residency, parent-output locality, prefix reuse, and future device reachability. Existing serving and DAG-scheduling policies mainly optimize immediate queue state, placement cost, or reuse signals in isolation, which can fragment useful state and increase end-to-end latency. We present FATE, a future-state-aware scheduler for heterogeneous LLM workflows. FATE combines a CP-SAT-backed frontier planner, horizon-aware candidate scoring, bounded multi-device shard execution, and state-conditional cost estimation. Rather than solving a monolithic full-DAG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
