Revisiting Locally Supervised Learning: an Alternative to End-to-end Training
Yulin Wang, Zanlin Ni, Shiji Song, Le Yang, Gao Huang

TL;DR
This paper introduces InfoPro, a locally supervised learning method that reduces memory usage in deep network training by preserving information in modules, enabling high-resolution training with less memory and potential acceleration.
Contribution
It proposes an information propagation loss and a surrogate optimization method to improve local module training, achieving competitive performance with significantly reduced memory footprint.
Findings
Achieves less than 40% memory usage compared to end-to-end training.
Maintains competitive accuracy on five datasets including ImageNet.
Enables asynchronous training of local modules for faster training.
Abstract
Due to the need to store the intermediate activations for back-propagation, end-to-end (E2E) training of deep networks usually suffers from high GPUs memory footprint. This paper aims to address this problem by revisiting the locally supervised learning, where a network is split into gradient-isolated modules and trained with local supervision. We experimentally show that simply training local modules with E2E loss tends to collapse task-relevant information at early layers, and hence hurts the performance of the full model. To avoid this issue, we propose an information propagation (InfoPro) loss, which encourages local modules to preserve as much useful information as possible, while progressively discard task-irrelevant information. As InfoPro loss is difficult to compute in its original form, we derive a feasible upper bound as a surrogate optimization objective, yielding a simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
