Memory-Based Meta-Learning on Non-Stationary Distributions
Tim Genewein, Gr\'egoire Del\'etang, Anian Ruoss, Li Kevin Wenliang,, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus, Hutter, Joel Veness

TL;DR
This paper investigates how memory-based neural models like Transformers and LSTMs can approximate Bayes-optimal predictors in non-stationary environments with unobserved switches, capturing natural sequence dynamics.
Contribution
It demonstrates that current sequence models can implicitly perform Bayesian inference over hidden regime switches and parameters in non-stationary data.
Findings
Neural models learn to identify switching points accurately.
Models behave as if performing Bayesian inference.
Sequence models approximate Bayes-optimal algorithms effectively.
Abstract
Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary sources with unobserved switching-points, which arguably capture an important characteristic of natural language and action-observation sequences in partially observable environments. We show that various types of memory-based neural models, including Transformers, LSTMs, and RNNs can learn to accurately approximate known Bayes-optimal algorithms and behave as if performing Bayesian inference over the latent switching-points and the latent parameters governing the data distribution within each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Gaussian Processes and Bayesian Inference
