TL;DR
This paper introduces a low-dimensional error feedback method for training large neural networks, matching traditional backpropagation performance while being more biologically plausible and computationally efficient.
Contribution
It presents a novel local learning rule based on Feedback Alignment that uses low-dimensional error signals, extending to complex architectures and challenging the need for high-dimensional gradients.
Findings
Low-dimensional error signals can match backpropagation performance.
The method enables efficient training of convolutional and transformer networks.
It offers a biologically plausible alternative to traditional gradient-based learning.
Abstract
Training deep neural networks typically relies on backpropagating high dimensional error signals a computationally intensive process with little evidence supporting its implementation in the brain. However, since most tasks involve low-dimensional outputs, we propose that low-dimensional error signals may suffice for effective learning. To test this hypothesis, we introduce a novel local learning rule based on Feedback Alignment that leverages indirect, low-dimensional error feedback to train large networks. Our method decouples the backward pass from the forward pass, enabling precise control over error signal dimensionality while maintaining high-dimensional representations. We begin with a detailed theoretical derivation for linear networks, which forms the foundation of our learning framework, and extend our approach to nonlinear, convolutional, and transformer architectures.…
Peer Reviews
Decision·Submitted to ICLR 2025
- Theoretical extensions of the Kolen-Pollack algorithm are very clear and easy to follow. Theoretical contributions were validated well on simulations in linear models. - Experiments on CIFAR-10 with different numbers of classes seem interesting given the dimensionality match between the number of classes and the rank of the matrix B. - The experiments on the receptive fields show an interesting relationship between features learned by the model and the rank of the gradient.
- Given the difficulty of FA to match higher class datasets (as shown by Lillicrap et al), the paper would benefit by doing experiments in ImageNet as in Akrout et al, 2019. Similarly to the CIFAR100 experiments, the rank of the B follows the number of categories in ImageNet. Would it do it on other datasets such as SVHN? - Novelty of low-rank gradients. There has been extensive work on the implicit regularizarion literature (see examples below), that show that gradient descent is regularized to
1. The subject of biological alternatives to back-propagation (BP) is a really interesting topic. 2. The authors provide good theoretical foundation of their method. 3. The study shows that error dimensionality shapes receptive fields which is interesting and provides insights on the emergence of representations in both artificial and biological systems. 4. The empirical study shows competitive performance with reduced error signal dimensionality, which could reduce computational costs withou
1. The paper is not very well-written in multiple aspect. The use of figurative language and lack of precision makes it somewhat ambiguous and less rigorous and I would consider rephrasing to focus on the main concept in a more direct way. This can be seen in the first paragraph of the introduction for example. The paper is sometimes very redundant, making the read feel repetitive and unnecessarily lengthy. The notations are repeated (eg. numerous mentions of $\mathbf{x}$ being the input vector)
1. The proposed method performs well and also provides a good approximation to backprop. 2. Sec. 5 results are interesting, as they suggest the same type of phenomena can be due to either forward or backward architectural choices (but see below). 3. Overall, this result shows that error signals have to at least match the task dimensionality for efficient learning, which provides useful intuition for bioplausible learning search.
1. Small improvement compared to [Akrout 2019] The proposed approach doesn’t add much to the discussion in [Akrout 2019], at least in my opinion. [Akrout 2019] (using Eq. 10 (top) for $P=I$ in the definitions of this paper) showed that the Kolen-Pollack rule is good enough to achieve backprop-level performance on ImageNet. The proposed approach introduces one more feedback layer per forward layer, which is arguably even less biologically plausible than the feedback alignment-style feedback netw
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
