PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning
Alireza Olama, Andreas Lundell, Izzat El Hajj, Johan Lilius, and Jerker Bj\"orkqvist

TL;DR
PruneX is a distributed training system that uses hierarchical structured pruning to significantly reduce inter-node communication and improve scaling efficiency in multi-GPU clusters.
Contribution
It introduces a hierarchical structured pruning algorithm and a co-designed system architecture to effectively reduce communication overhead during distributed CNN training.
Findings
Reduces inter-node communication volume by ~60%.
Achieves 6.75x strong scaling speedup on 64 GPUs.
Outperforms dense baseline and gradient compression methods.
Abstract
Inter-node communication bandwidth increasingly constrains distributed training at scale on multi-node GPU clusters. While compact models are the ultimate deployment target, conventional pruning-aware distributed training systems typically fail to reduce communication overhead because unstructured sparsity cannot be efficiently exploited by highly optimized dense collective primitives. We present PruneX, a distributed data-parallel training system that co-designs pruning algorithms with cluster hierarchy to reduce inter-node bandwidth usage. PruneX introduces the Hierarchical Structured ADMM (H-SADMM) algorithm, which enforces node-level structured sparsity before inter-node synchronization, enabling dynamic buffer compaction that eliminates both zero-valued transmissions and indexing overhead. The system adopts a leader-follower execution model with separated intra-node and inter-node…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques
