Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Yicheng Feng; Xin Tan; Yangtao Deng; Yimin Jiang; Yibo Zhu; Hong Xu

arXiv:2605.21312·cs.DC·May 21, 2026

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation

Yicheng Feng, Xin Tan, Yangtao Deng, Yimin Jiang, Yibo Zhu, Hong Xu

PDF

TL;DR

Frontier is a detailed discrete-event simulator designed to accurately model modern large language model inference serving, capturing complex architectures and optimizations to improve prediction accuracy and scalability.

Contribution

It introduces a disaggregated abstraction and models key runtime optimizations, significantly improving simulation accuracy over existing tools for modern LLM serving systems.

Findings

01

Achieves less than 4% average throughput error on 16-GPU testbed.

02

Reduces latency prediction error from over 44% to below 7%.

03

Scales to over 1,000 GPUs enabling advanced workload analysis.

Abstract

Modern LLM serving is no longer homogeneous or monolithic. Production systems now combine disaggregated execution, complex parallelism, runtime optimizations, and stateful workloads such as reasoning, agents, and RL rollouts. Simulation is attractive for exploring this growing design space, yet existing simulators lack the architectural completeness and decision-grade fidelity it demands. Their monolithic-replica abstractions are ill-suited to disaggregated serving, while average-case analytical proxies can distort SLA predictions and even reverse optimization conclusions. We present Frontier, a discrete-event simulator for modern LLM inference serving. Frontier features a disaggregated abstraction. It captures the structure and dynamics of modern serving systems by modeling co-location, Prefill-Decode Disaggregation (PDD), and Attention-FFN Disaggregation (AFD) with role-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.