Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
Yicheng Feng, Xin Tan, Yangtao Deng, Yimin Jiang, Yibo Zhu, Hong Xu

TL;DR
Frontier is a detailed discrete-event simulator designed to accurately model modern large language model inference serving, capturing complex architectures and optimizations to improve prediction accuracy and scalability.
Contribution
It introduces a disaggregated abstraction and models key runtime optimizations, significantly improving simulation accuracy over existing tools for modern LLM serving systems.
Findings
Achieves less than 4% average throughput error on 16-GPU testbed.
Reduces latency prediction error from over 44% to below 7%.
Scales to over 1,000 GPUs enabling advanced workload analysis.
Abstract
Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing design space, yet existing simulators lack the architectural completeness and decision-grade fidelity it demands. Their monolithic-replica abstractions are ill-suited to disaggregated serving, while average-case analytical proxies can distort SLA predictions and even reverse optimization conclusions. We present Frontier, a discrete-event simulator for modern LLM inference serving. Frontier features a disaggregated abstraction. It captures the structure and dynamics of modern serving systems by modeling co-location, Prefill-Decode Disaggregation (PDD), and Attention-FFN Disaggregation (AFD) with role-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
