Layer-Parallel Training of Residual Networks with Auxiliary-Variable Networks
Qi Sun, Hexin Dong, Zewei Chen, Jiacheng Sun, Zhenguo Li, Bin Dong

TL;DR
This paper introduces a novel parallel training framework for residual networks that leverages auxiliary networks to enable data augmentation and parallel backward passes, significantly speeding up training without sacrificing accuracy.
Contribution
The work proposes a joint learning framework using auxiliary-variable methods with auxiliary networks, enabling layer-parallel training of ResNets with data augmentation and reduced communication overhead.
Findings
Achieves significant speedup over traditional serial training methods.
Maintains comparable accuracy on CIFAR-10, CIFAR-100, and ImageNet datasets.
Enables data augmentation during parallel training of ResNets.
Abstract
Gradient-based methods for the distributed training of residual networks (ResNets) typically require a forward pass of the input data, followed by back-propagating the error gradient to update model parameters, which becomes time-consuming as the network goes deeper. To break the algorithmic locking and exploit synchronous module parallelism in both the forward and backward modes, auxiliary-variable methods have attracted much interest lately but suffer from significant communication overhead and lack of data augmentation. In this work, a novel joint learning framework for training realistic ResNets across multiple compute devices is established by trading off the storage and recomputation of external auxiliary variables. More specifically, the input data of each independent processor is generated from its low-capacity auxiliary network (AuxNet), which permits the use of data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Domain Adaptation and Few-Shot Learning
