Performance Characterization of Distributed Deep Learning Strategies: A Quantitative Evaluation of DDP, FSDP, and Parameter Server Architectures on GPU Clusters
Md Sultanul Islam Ovi

TL;DR
This paper empirically compares distributed deep learning strategies—DDP, FSDP, and Parameter Server—on GPU clusters, analyzing their performance, memory efficiency, and accuracy impacts to guide system design choices.
Contribution
It provides a comprehensive, side-by-side evaluation of the three main distributed training paradigms across different hardware setups, highlighting their trade-offs and optimal use cases.
Findings
FSDP reduces peak memory usage by 4-6x, aiding memory-constrained training.
Asynchronous Parameter Server speeds up training by 28% but causes 4-17% accuracy loss.
DPP offers 2-3x throughput speedup on high-performance clusters.
Abstract
Efficiently scaling deep neural networks across GPU clusters requires navigating complex trade-offs between computational throughput, memory utilization, and synchronization overhead. This paper presents a unified empirical evaluation of three dominant distributed training paradigms: Distributed Data Parallel (DDP), Fully Sharded Data Parallel (FSDP), and the Parameter Server (PS) architecture. We conduct side-by-side benchmarking on both high-performance (NVIDIA A100) and commodity-class (NVIDIA A10G) clusters to isolate the impact of communication bandwidth and gang-scheduling dependencies. Our results indicate that while DDP achieves a 2-3x speedup in training throughput for standard architectures, FSDP demonstrates a 4-6x reduction in peak memory usage, validating its utility for memory-constrained environments despite higher communication latency. Furthermore, we evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Neural Network Applications
