System-performance and cost modeling of Large Language Model training and inference
Wenzhe Guo, Joyjit Kundu, Uras Tos, Weijiang Kong, Giuliano Sisto, Timon Evenblij, and Manu Perumkunnil

TL;DR
This paper presents a comprehensive performance and cost modeling framework for large language model training and inference, incorporating recent innovations and system design considerations to guide future hardware and software development.
Contribution
It introduces an integrated analytical modeling approach that combines recent compute, memory, and communication techniques with cost analysis for LLM systems.
Findings
The model effectively analyzes performance-cost trade-offs across different system architectures.
Incorporates recent innovations like flash attention and mixture of experts into performance modeling.
Provides insights for optimizing hardware-software co-design for LLMs.
Abstract
Large language models (LLMs), based on transformer architectures, have revolutionized numerous domains within artificial intelligence, science, and engineering due to their exceptional scalability and adaptability. However, the exponential growth in LLM size and complexity has outpaced advancements in compute capacity, memory bandwidth, network performance, and cost efficiency, posing significant challenges to their scalability on distributed systems. To address these limitations, alternative model architectures, optimization strategies, communication-aware network topologies, and novel system design approaches have been proposed in literature. This paper introduces a performance-cost modeling methodology for LLM training and inference that integrates state-of-the-art compute techniques with memory optimizations, and latest communication techniques. Building on an analytical performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
