FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, and Tushar Krishna

TL;DR
FRED is a flexible wafer-scale interconnect designed for high-performance distributed DNN training, significantly reducing training times across various models by optimizing communication patterns and supporting in-switch collective operations.
Contribution
The paper introduces FRED, a novel wafer-scale interconnect that enhances flexibility and performance for distributed DNN training, enabling efficient execution of diverse parallelization strategies.
Findings
FRED reduces training time for ResNet-152 by 1.76X.
FRED improves Transformer-17B training time by 1.87X.
FRED supports in-switch collective communication, halving network traffic.
Abstract
Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Body Area Networks
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Multi-Head Attention · Softmax · Layer Normalization · Linear Warmup With Cosine Annealing
