System-performance and cost modeling of Large Language Model training and inference

Wenzhe Guo; Joyjit Kundu; Uras Tos; Weijiang Kong; Giuliano Sisto; Timon Evenblij; and Manu Perumkunnil

arXiv:2507.02456·cs.AR·July 4, 2025

System-performance and cost modeling of Large Language Model training and inference

Wenzhe Guo, Joyjit Kundu, Uras Tos, Weijiang Kong, Giuliano Sisto, Timon Evenblij, and Manu Perumkunnil

PDF

TL;DR

This paper presents a comprehensive performance and cost modeling framework for large language model training and inference, incorporating recent innovations and system design considerations to guide future hardware and software development.

Contribution

It introduces an integrated analytical modeling approach that combines recent compute, memory, and communication techniques with cost analysis for LLM systems.

Findings

01

The model effectively analyzes performance-cost trade-offs across different system architectures.

02

Incorporates recent innovations like flash attention and mixture of experts into performance modeling.

03

Provides insights for optimizing hardware-software co-design for LLMs.

Abstract

Large language models (LLMs), based on transformer architectures, have revolutionized numerous domains within artificial intelligence, science, and engineering due to their exceptional scalability and adaptability. However, the exponential growth in LLM size and complexity has outpaced advancements in compute capacity, memory bandwidth, network performance, and cost efficiency, posing significant challenges to their scalability on distributed systems. To address these limitations, alternative model architectures, optimization strategies, communication-aware network topologies, and novel system design approaches have been proposed in literature. This paper introduces a performance-cost modeling methodology for LLM training and inference that integrates state-of-the-art compute techniques with memory optimizations, and latest communication techniques. Building on an analytical performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.