Hierarchical Training of Deep Neural Networks Using Early Exiting
Yamin Sepehri, Pedram Pad, Ahmet Caner Y\"uz\"ug\"uler, Pascal, Frossard, L. Andrea Dunbar

TL;DR
This paper introduces a hierarchical training method for deep neural networks that uses early exits to split training between edge and cloud, reducing communication, runtime, and privacy issues while maintaining accuracy.
Contribution
It presents a novel use of early exits in a hierarchical training framework that allows simultaneous training on edge and cloud without sharing raw data or incurring backward pass communication.
Findings
Reduces training runtime by up to 81% on certain architectures.
Maintains negligible accuracy loss despite reduced communication.
Effective for online learning on low-resource edge devices.
Abstract
Deep neural networks provide state-of-the-art accuracy for vision tasks but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime and privacy concerns. In this study, a novel hierarchical training method for deep neural networks is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques
