Dynamical versus Bayesian Phase Transitions in a Toy Model of   Superposition

Zhongtian Chen; Edmund Lau; Jake Mendel; Susan Wei; Daniel Murfet

arXiv:2310.06301·cs.LG·October 11, 2023

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

Zhongtian Chen, Edmund Lau, Jake Mendel, Susan Wei, Daniel Murfet

PDF

Open Access

TL;DR

This paper explores phase transitions in a simplified model of superposition using Singular Learning Theory, revealing geometric critical points that influence Bayesian and SGD learning behaviors and suggesting a sequential learning mechanism.

Contribution

It introduces a closed-form formula for the theoretical loss in the Toy Model of Superposition and identifies geometric critical points affecting phase transitions in Bayesian and SGD training.

Findings

01

Regular $k$-gons are critical points in the model.

02

The local learning coefficient determines phase transitions.

03

SGD and Bayesian learning follow a sequential journey through parameter space.

Abstract

We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theory (SLT). We derive a closed formula for the theoretical loss and, in the case of two hidden dimensions, discover that regular $k$ -gons are critical points. We present supporting theory indicating that the local learning coefficient (a geometric invariant) of these $k$ -gons determines phase transitions in the Bayesian posterior as a function of training sample size. We then show empirically that the same $k$ -gon critical points also determine the behavior of SGD training. The picture that emerges adds evidence to the conjecture that the SGD learning trajectory is subject to a sequential learning mechanism. Specifically, we find that the learning process in TMS, be it through SGD or Bayesian learning, can be characterized by a journey through parameter space from regions of high loss and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and Algorithms · Bayesian Methods and Mixture Models

MethodsStochastic Gradient Descent