ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage
Siyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler

TL;DR
ATLAHS is a versatile, open-source network simulation toolchain that accurately models real-world AI, HPC, and storage workloads, outperforming existing simulators in speed and trace efficiency.
Contribution
It introduces a flexible, application-centric simulation framework that supports multiple backends and scenarios, enabling more realistic and comprehensive performance analysis.
Findings
Achieves less than 5% error in workload simulation accuracy.
Outperforms AstraSim in runtime and trace size efficiency.
Demonstrates utility through case studies on congestion control and job placement.
Abstract
Network simulators play a crucial role in evaluating the performance of large-scale systems. However, existing simulators rely heavily on synthetic microbenchmarks or narrowly focus on specific domains, limiting their ability to provide comprehensive performance insights. In this work, we introduce ATLAHS, a flexible, extensible, and open-source toolchain designed to trace real-world applications and accurately simulate their workloads. ATLAHS leverages the GOAL format to model communication and computation patterns in AI, HPC, and distributed storage applications. It supports multiple network simulation backends and handles multi-job and multi-tenant scenarios. Through extensive validation, we demonstrate that ATLAHS achieves high accuracy in simulating realistic workloads (consistently less than 5% error), while significantly outperforming AstraSim, the current state-of-the-art AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Caching and Content Delivery
