Towards Interpretable Deep Local Learning with Successive Gradient   Reconciliation

Yibo Yang; Xiaojie Li; Motasem Alfarra; Hasan Hammoud; Adel Bibi,; Philip Torr; Bernard Ghanem

arXiv:2406.05222·cs.LG·June 11, 2024·1 cites

Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

Yibo Yang, Xiaojie Li, Motasem Alfarra, Hasan Hammoud, Adel Bibi,, Philip Torr, Bernard Ghanem

PDF

Open Access

TL;DR

This paper introduces a novel local training strategy for neural networks that successively reconciles gradients between modules, improving performance and reducing memory use without relying on global back-propagation.

Contribution

It provides the first theoretical analysis of gradient reconciliation in local learning and proposes a new method that enhances local training with significant empirical gains.

Findings

01

Achieves competitive performance with global BP on ImageNet.

02

Reduces memory consumption by over 40%.

03

Improves training stability and convergence in local learning.

Abstract

Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local learning optimizes gradient-isolated modules of a neural network with local errors and has been proved to be effective even on large-scale datasets. However, the reconciliation among local errors has never been investigated. In this paper, we first theoretically study non-greedy layer-wise training and show that the convergence cannot be assured when the local gradient in a module w.r.t. its input is not reconciled with the local gradient in the previous module w.r.t. its output. Inspired by the theoretical result, we further propose a local training strategy that successively regularizes the gradient reconciliation between neighboring modules without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer