PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning

Alireza Olama; Andreas Lundell; Izzat El Hajj; Johan Lilius; and Jerker Bj\"orkqvist

arXiv:2512.14628·cs.DC·December 17, 2025

PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning

Alireza Olama, Andreas Lundell, Izzat El Hajj, Johan Lilius, and Jerker Bj\"orkqvist

PDF

Open Access

TL;DR

PruneX is a distributed training system that uses hierarchical structured pruning to significantly reduce inter-node communication and improve scaling efficiency in multi-GPU clusters.

Contribution

It introduces a hierarchical structured pruning algorithm and a co-designed system architecture to effectively reduce communication overhead during distributed CNN training.

Findings

01

Reduces inter-node communication volume by ~60%.

02

Achieves 6.75x strong scaling speedup on 64 GPUs.

03

Outperforms dense baseline and gradient compression methods.

Abstract

Inter-node communication bandwidth increasingly constrains distributed training at scale on multi-node GPU clusters. While compact models are the ultimate deployment target, conventional pruning-aware distributed training systems typically fail to reduce communication overhead because unstructured sparsity cannot be efficiently exploited by highly optimized dense collective primitives. We present PruneX, a distributed data-parallel training system that co-designs pruning algorithms with cluster hierarchy to reduce inter-node bandwidth usage. PruneX introduces the Hierarchical Structured ADMM (H-SADMM) algorithm, which enforces node-level structured sparsity before inter-node synchronization, enabling dynamic buffer compaction that eliminates both zero-valued transmissions and indexing overhead. The system adopts a leader-follower execution model with separated intra-node and inter-node…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Parallel Computing and Optimization Techniques