KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
Aram Davtyan, Sepehr Sameni, Llukman Cerkezi, Givi Meishvilli, Adam, Bielski, Paolo Favaro

TL;DR
KOALA introduces a novel stochastic optimization method using Kalman filtering to adaptively estimate neural network parameters amidst noisy loss signals, improving training efficiency and performance.
Contribution
The paper presents KOALA, a new Kalman filter-based optimizer that models loss as noisy observations, capturing gradient dynamics of advanced optimizers like Adam.
Findings
KOALA achieves comparable or better results than state-of-the-art optimizers.
It is easy to implement and scalable for large neural networks.
Experimental results span computer vision and language modeling tasks.
Abstract
Optimization is often cast as a deterministic problem, where the solution is found through some iterative procedure such as gradient descent. However, when training neural networks the loss function changes over (iteration) time due to the randomized selection of a subset of the samples. This randomization turns the optimization problem into a stochastic one. We propose to consider the loss as a noisy observation with respect to some reference optimum. This interpretation of the loss allows us to adopt Kalman filtering as an optimizer, as its recursive formulation is designed to estimate unknown parameters from noisy measurements. Moreover, we show that the Kalman Filter dynamical model for the evolution of the unknown parameters can be used to capture the gradient dynamics of advanced methods such as Momentum and Adam. We call this stochastic optimization method KOALA, which is short…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
MethodsAdam
