Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Yibo Yang, Xiaojie Li, Motasem Alfarra, Hasan Hammoud, Adel Bibi,, Philip Torr, Bernard Ghanem

TL;DR
This paper introduces a novel local training strategy for neural networks that successively reconciles gradients between modules, improving performance and reducing memory use without relying on global back-propagation.
Contribution
It provides the first theoretical analysis of gradient reconciliation in local learning and proposes a new method that enhances local training with significant empirical gains.
Findings
Achieves competitive performance with global BP on ImageNet.
Reduces memory consumption by over 40%.
Improves training stability and convergence in local learning.
Abstract
Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural network with local errors and has been proved to be effective even on large-scale datasets. However, the reconciliation among local errors has never been investigated. In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer
