A ghost mechanism: An analytical model of abrupt learning in recurrent networks
Fatih Dinc, Ege Cirakman, Bariscan Kurtkaya, Mert Yuksekgonul, Yiqi Jiang, Mark J. Schnitzer, Hidenori Tanaka

TL;DR
This paper introduces the ghost mechanism, an analytical model explaining abrupt learning in recurrent neural networks through transient slow regions near saddle-node bifurcations, revealing critical factors influencing sudden performance improvements.
Contribution
The study develops a canonical form model of ghost points in RNNs, identifying a critical learning rate and mechanisms behind abrupt learning and failures, validated across low-rank and full-rank networks.
Findings
Identifies a critical learning rate scaling as an inverse power law.
Demonstrates how vanishing and oscillatory gradients cause learning collapse.
Shows increasing rank and reducing output confidence improve learning stability.
Abstract
Abrupt learning is a common phenomenon in recurrent neural networks (RNNs) trained on working memory tasks. In such cases, the networks develop transient slow regions in state space that extend the effective timescales of computation. However, the mechanisms driving sudden performance improvements and their causal role remain unclear. To address this gap, we introduce the ghost mechanism, a process by which dynamical systems exhibit transient slowdown near the remnant of a saddle-node bifurcation. By reducing the high-dimensional dynamics near ghost points, we derive a one-dimensional canonical form that analytically captures learning as a process controlled by a single scale parameter. Using this model, we study a form of abrupt learning emerging from ghost points and identify a critical learning rate that scales as an inverse power law with the timescale of the learned computation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
