Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training
Zixuan Chen, Xuandong Liu, Minglin Li, Yinfan Hu, Hao Mei, Huifeng, Xing, Hao Wang, Wanxin Shi, Sen Liu, and Yang Xu

TL;DR
Rina introduces an in-network aggregation approach integrated with Ring-AllReduce, significantly improving throughput and deployment flexibility in distributed deep learning training.
Contribution
It presents a novel Rina framework that incorporates INA into RAR, enabling incremental deployment and enhanced performance in distributed model training.
Findings
Rina achieves over 50% throughput improvement compared to PS-based INA methods.
Rina offers better incremental deployment capabilities with minimal hardware changes.
Extensive evaluations confirm Rina's superior performance in distributed training scenarios.
Abstract
Parameter Server (PS) and Ring-AllReduce (RAR) are two widely utilized synchronization architectures in multi-worker Deep Learning (DL), also referred to as Distributed Deep Learning (DDL). However, PS encounters challenges with the ``incast'' issue, while RAR struggles with problems caused by the long dependency chain. The emerging In-network Aggregation (INA) has been proposed to integrate with PS to mitigate its incast issue. However, such PS-based INA has poor incremental deployment abilities as it requires replacing all the switches to show significant performance improvement, which is not cost-effective. In this study, we present the incorporation of INA capabilities into RAR, called RAR with In-Network Aggregation (Rina), to tackle both the problems above. Rina features its agent-worker mechanism. When an INA-capable ToR switch is deployed, all workers in this rack run as one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
