APEX: An Extensible and Dynamism-Aware Simulator for Automated Parallel Execution in LLM Serving
Yi-Chien Lin, Woosuk Kwon, Ronald Pineda, Fanny Nina Paravecino

TL;DR
APEX is a versatile simulator that efficiently identifies optimal parallel execution plans for large language model serving, reducing planning time and energy consumption while accommodating diverse models and workloads.
Contribution
We introduce APEX, an extensible, dynamism-aware simulation system that models LLM serving to rapidly find cost-effective, high-performance parallel execution strategies.
Findings
APEX finds optimal plans up to 3.37x faster than heuristics.
APEX reduces energy consumption by up to 45% compared to latency-optimal plans.
APEX identifies plans within 15 minutes on a CPU, significantly faster and cheaper than GPU deployment.
Abstract
Efficiently serving Large Language Models (LLMs) requires selecting an optimal parallel execution plan, balancing computation, memory, and communication overhead. However, determining the best strategy is challenging due to varying parallelism techniques (data, pipeline, tensor) and workload characteristics (e.g., compute-intensive tasks with long prompts vs. memory-intensive tasks with long generation). We propose APEX, an LLM serving system simulator that efficiently identifies optimal parallel execution plans by considering key factors of LLM serving systems, such as memory usage, batching behavior, etc. APEX performs dynamism-aware simulation to model iteration-level batching, and leverages LLMs' repetitive structure to reduce design space, scaling efficiently to trillion-scale models. APEX abstracts the key components of LLM serving systems, including the model, batching module,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Learning Control Systems · Metallurgy and Material Forming
