# Meta-descent for Online, Continual Prediction

**Authors:** Andrew Jacobsen, Matthew Schlegel, Cameron Linke, Thomas Degris, Adam, White, Martha White

arXiv: 1907.07751 · 2019-12-16

## TL;DR

This paper explores vector step-size adaptation methods for online, non-stationary prediction, introducing a new meta-descent algorithm called AdaGain that outperforms traditional methods in various scenarios.

## Contribution

The paper introduces AdaGain, a general meta-descent algorithm for step-size adaptation applicable to diverse online prediction algorithms, especially in non-stationary environments.

## Key findings

- Meta-descent methods outperform quasi-second order methods in non-stationary tasks.
- AdaGain demonstrates robustness and competitive performance on real-world data.
- Both families of methods can perform well, with meta-descent showing particular advantages.

## Abstract

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update---a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the step-size parameters to minimize prediction error. These meta-descent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental meta-descent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.07751/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1907.07751/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1907.07751/full.md

---
Source: https://tomesphere.com/paper/1907.07751