Beyond Backprop: Online Alternating Minimization with Auxiliary   Variables

Anna Choromanska; Benjamin Cowen; Sadhana Kumaravel; Ronny Luss,; Mattia Rigotti; Irina Rish; Brian Kingsbury; Paolo DiAchille; Viatcheslav; Gurev; Ravi Tejwani; Djallel Bouneffouf

arXiv:1806.09077·stat.ML·June 6, 2019·33 cites

Beyond Backprop: Online Alternating Minimization with Auxiliary Variables

Anna Choromanska, Benjamin Cowen, Sadhana Kumaravel, Ronny Luss,, Mattia Rigotti, Irina Rish, Brian Kingsbury, Paolo DiAchille, Viatcheslav, Gurev, Ravi Tejwani, Djallel Bouneffouf

PDF

Open Access 1 Repo

TL;DR

This paper introduces an online alternating minimization method for training deep neural networks, overcoming limitations of backpropagation, with theoretical guarantees and promising empirical results across various architectures.

Contribution

It presents the first online stochastic AM algorithm for deep learning, with convergence guarantees and improved training flexibility over traditional offline methods.

Findings

01

Effective training on large datasets using online AM

02

Theoretical convergence guarantees in stochastic settings

03

Empirical success across multiple neural network architectures

Abstract

Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradients, inability to handle non-differentiable nonlinearities and to parallelize weight-updates across layers, and biological implausibility. These limitations continue to motivate exploration of alternative training algorithms, including several recently proposed auxiliary-variable methods which break the complex nested objective function into local subproblems. However, those techniques are mainly offline (batch), which limits their applicability to extremely large datasets, as well as to online, continual or reinforcement learning. The main contribution of our work is a novel online…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/online-alt-min
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning

MethodsAttention Model