Plateaus, Optima, and Overfitting in Multi-Layer Perceptrons: A Saddle-Saddle-Attractor Scenario
Alex Al\`i Maleknia, Yuzuru Sato

TL;DR
This paper offers a dynamical systems perspective on training multi-layer perceptrons, revealing how saddle structures influence learning plateaus, overfitting, and convergence limitations, especially with finite noisy data.
Contribution
It introduces a minimal dynamical model that explains the role of saddle points in training dynamics and overfitting in MLPs, extending understanding beyond asymptotic analyses.
Findings
Training dynamics pass through saddle-organized plateaus and near-optimal regions.
Under certain data conditions, the model converges to a single attractor, leading to overfitting.
Finite noisy datasets prevent convergence to the theoretical optimum, causing overfitting.
Abstract
Vanishing gradients and overfitting are central problems in machine learning, yet are typically analyzed in asymptotic regimes that obscure their dynamical origins. Here we provide a dynamical description of learning in multi-layer perceptrons (MLPs) via a minimal model inspired by Fukumizu and Amari. We show that training dynamics traverse plateau and near-optimal regions, both organized by saddle structures, before converging to an overfitting regime. Under suitable conditions on the data, this regime collapses to a single attractor modulo symmetry. Furthermore, for finite noisy datasets, convergence to the theoretical optimum is impossible, and the dynamics necessarily settle into an overfitting solution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
