The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold
Jialin Mao, Itay Griniasty, Han Kheng Teoh, Rahul Ramesh, Rubing Yang,, Mark K. Transtrum, James P. Sethna, Pratik Chaudhari

TL;DR
This paper uses information geometry to show that diverse deep networks during training explore a common low-dimensional manifold in their prediction space, regardless of architecture, size, or training method.
Contribution
It introduces a geometric framework revealing that various deep networks' training trajectories lie on a shared low-dimensional manifold in prediction space.
Findings
Networks with different architectures follow distinguishable trajectories.
Larger networks train faster but on the same manifold as smaller ones.
Networks initialized differently converge along a similar manifold.
Abstract
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification
