On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units - Steepest Gradient Descent and Natural Gradient Descent -
Masato Inoue, Hyeyoung Park, Masato Okada

TL;DR
This paper analyzes how the correlation of hidden units in soft committee machines affects learning dynamics, showing that natural gradient descent avoids plateaus regardless of correlation, unlike conventional gradient descent.
Contribution
It provides a theoretical analysis of the impact of weight vector correlation on learning dynamics in soft committee machines using statistical mechanics.
Findings
Conventional gradient descent takes longer with higher correlation.
Natural gradient descent avoids plateaus regardless of correlation.
Analytical results support the observed dynamics around saddle points.
Abstract
The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors of hidden units in a teacher network is thought to affect this saddle structure, resulting in a prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to soft committee machines and on-line learning using statistical mechanics. Conventional gradient descent needs more time to break the symmetry as the correlation of the teacher weight vectors rises. On the other hand, no plateaus occur with natural gradient descent regardless of the correlation for the limit of a low learning rate. Analytical results support these dynamics around the saddle point.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
