Layer-Parallel Training of Residual Networks with Auxiliary-Variable   Networks

Qi Sun; Hexin Dong; Zewei Chen; Jiacheng Sun; Zhenguo Li; Bin Dong

arXiv:2112.05387·cs.LG·December 13, 2021

Layer-Parallel Training of Residual Networks with Auxiliary-Variable Networks

Qi Sun, Hexin Dong, Zewei Chen, Jiacheng Sun, Zhenguo Li, Bin Dong

PDF

Open Access

TL;DR

This paper introduces a novel parallel training framework for residual networks that leverages auxiliary networks to enable data augmentation and parallel backward passes, significantly speeding up training without sacrificing accuracy.

Contribution

The work proposes a joint learning framework using auxiliary-variable methods with auxiliary networks, enabling layer-parallel training of ResNets with data augmentation and reduced communication overhead.

Findings

01

Achieves significant speedup over traditional serial training methods.

02

Maintains comparable accuracy on CIFAR-10, CIFAR-100, and ImageNet datasets.

03

Enables data augmentation during parallel training of ResNets.

Abstract

Gradient-based methods for the distributed training of residual networks (ResNets) typically require a forward pass of the input data, followed by back-propagating the error gradient to update model parameters, which becomes time-consuming as the network goes deeper. To break the algorithmic locking and exploit synchronous module parallelism in both the forward and backward modes, auxiliary-variable methods have attracted much interest lately but suffer from significant communication overhead and lack of data augmentation. In this work, a novel joint learning framework for training realistic ResNets across multiple compute devices is established by trading off the storage and recomputation of external auxiliary variables. More specifically, the input data of each independent processor is generated from its low-capacity auxiliary network (AuxNet), which permits the use of data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Imaging and Analysis · Domain Adaptation and Few-Shot Learning