Block-wise Training of Residual Networks via the Minimizing Movement Scheme
Skander Karkar, Ibrahim Ayed, Emmanuel de B\'ezenac, Patrick, Gallinari

TL;DR
This paper introduces a novel layer-wise training method for Residual Networks inspired by the minimizing movement scheme, addressing locking problems and improving test accuracy in constrained training environments.
Contribution
It develops a kinetic energy regularization approach for block-wise training of ResNets, alleviating stagnation and enabling parallel training.
Findings
Improved test accuracy of ResNets with block-wise training.
Effective alleviation of layer overfitting and stagnation.
Compatibility with sequential and parallel training modes.
Abstract
End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design · Ferroelectric and Negative Capacitance Devices · Stochastic Gradient Optimization Techniques
MethodsTest
