FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Saeed Rashidi; William Won; Sudarshan Srinivasan; Puneet Gupta; and Tushar Krishna

arXiv:2406.19580·cs.AR·June 10, 2025

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, and Tushar Krishna

PDF

Open Access

TL;DR

FRED is a flexible wafer-scale interconnect designed for high-performance distributed DNN training, significantly reducing training times across various models by optimizing communication patterns and supporting in-switch collective operations.

Contribution

The paper introduces FRED, a novel wafer-scale interconnect that enhances flexibility and performance for distributed DNN training, enabling efficient execution of diverse parallelization strategies.

Findings

01

FRED reduces training time for ResNet-152 by 1.76X.

02

FRED improves Transformer-17B training time by 1.87X.

03

FRED supports in-switch collective communication, halving network traffic.

Abstract

Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Body Area Networks

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Multi-Head Attention · Softmax · Layer Normalization · Linear Warmup With Cosine Annealing