Scaling Intelligence: Designing Data Centers for Next-Gen Language Models
Jesmin Jahan Tithi, Hanjiang Wu, Avishaii Abuhatzera, Fabrizio Petrini

TL;DR
This paper presents a comprehensive co-design framework for next-generation AI data centers tailored for large language models, emphasizing innovative network architectures and system optimizations to enhance scalability, efficiency, and performance.
Contribution
It introduces and evaluates the FullFlat network architecture and a co-design methodology for optimizing data center components for large language models.
Findings
FullFlat networks improve performance and scalability.
Overlapping compute and communication enhances efficiency.
Design choices significantly impact Model FLOPS Utilization.
Abstract
The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and cost-effectiveness. Our work provides a comprehensive co-design framework that jointly explores FLOPS, HBM bandwidth and capacity, multiple network topologies (two-tier vs. FullFlat optical), the size of the scale-out domain, and popular parallelism/optimization strategies used in LLMs. We introduce and evaluate FullFlat network architectures, which provide uniform high-bandwidth, low-latency connectivity between all nodes, and demonstrate their transformative impact on performance and scalability. Through detailed sensitivity analyses, we quantify the benefits of overlapping compute and communication, leveraging hardware-accelerated collectives, widening the scale-out domain, and increasing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Topic Modeling
MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer · GPT-4
