Local Learning on Transformers via Feature Reconstruction
Priyank Pathak, Jingwei Zhang, Dimitris Samaras

TL;DR
This paper introduces a novel local learning method for transformers that reconstructs input features instead of entire images, reducing memory usage and maintaining high performance across multiple datasets.
Contribution
It is the first to apply local learning to transformers by reconstructing features, significantly reducing memory requirements while improving performance.
Findings
Outperforms InfoPro-Transformer by up to 0.58% on several datasets.
Uses up to 12% less memory than previous methods.
Requires 36-45% less GPU memory compared to end-to-end training.
Abstract
Transformers are becoming increasingly popular due to their superior performance over conventional convolutional neural networks(CNNs). However, transformers usually require a much larger amount of memory to train than CNNs, which prevents their application in many low resource settings. Local learning, which divides the network into several distinct modules and trains them individually, is a promising alternative to the end-to-end (E2E) training approach to reduce the amount of memory for training and to increase parallelism. This paper is the first to apply Local Learning on transformers for this purpose. The standard CNN-based local learning method, InfoPro [32], reconstructs the input images for each module in a CNN. However, reconstructing the entire image does not generalize well. In this paper, we propose a new mechanism for each local module, where instead of reconstructing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
