Beyond Backprop: Online Alternating Minimization with Auxiliary Variables
Anna Choromanska, Benjamin Cowen, Sadhana Kumaravel, Ronny Luss,, Mattia Rigotti, Irina Rish, Brian Kingsbury, Paolo DiAchille, Viatcheslav, Gurev, Ravi Tejwani, Djallel Bouneffouf

TL;DR
This paper introduces an online alternating minimization method for training deep neural networks, overcoming limitations of backpropagation, with theoretical guarantees and promising empirical results across various architectures.
Contribution
It presents the first online stochastic AM algorithm for deep learning, with convergence guarantees and improved training flexibility over traditional offline methods.
Findings
Effective training on large datasets using online AM
Theoretical convergence guarantees in stochastic settings
Empirical success across multiple neural network architectures
Abstract
Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsAttention Model
