Training Deep Architectures Without End-to-End Backpropagation: A Survey on the Provably Optimal Methods
Shiyu Duan, Jose C. Principe

TL;DR
This survey reviews provably optimal modular training methods for deep architectures that avoid end-to-end backpropagation, highlighting their advantages, performance, and implications for deep learning scalability and understanding.
Contribution
It provides a comprehensive overview of modular training alternatives to backpropagation, emphasizing their practical benefits and potential to improve deep learning workflows.
Findings
Modular training can match or outperform end-to-end backpropagation on datasets like ImageNet.
Modular approaches offer greater transparency and scalability in deep learning.
They provide solutions to data efficiency and transferability estimation challenges.
Abstract
This tutorial paper surveys provably optimal alternatives to end-to-end backpropagation (E2EBP) -- the de facto standard for training deep architectures. Modular training refers to strictly local training without both the forward and the backward pass, i.e., dividing a deep architecture into several nonoverlapping modules and training them separately without any end-to-end operation. Between the fully global E2EBP and the strictly local modular training, there are weakly modular hybrids performing training without the backward pass only. These alternatives can match or surpass the performance of E2EBP on challenging datasets such as ImageNet, and are gaining increasing attention primarily because they offer practical advantages over E2EBP, which will be enumerated herein. In particular, they allow for greater modularity and transparency in deep learning workflows, aligning deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
