Modify Training Directions in Function Space to Reduce Generalization Error
Yi Yu, Wenlian Lu, Boyu Chen

TL;DR
This paper introduces a theoretical framework for modifying neural network training directions in function space, reducing generalization error by balancing errors from training data and distribution mismatch, supported by analytical derivations and numerical examples.
Contribution
It provides a novel theoretical analysis of a modified natural gradient method in neural tangent kernel space, explaining how training direction adjustments improve generalization.
Findings
Modified training directions reduce total generalization error.
Theoretical analysis explains existing generalization enhancement methods.
Numerical examples validate the theoretical predictions.
Abstract
We propose theoretical analyses of a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix. We firstly present analytical expression for the function learned by this modified natural gradient under the assumptions of Gaussian distribution and infinite width limit. Thus, we explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory. By decomposing of the total generalization error attributed to different eigenspace of the kernel in function space, we propose a criterion for balancing the errors stemming from training set and the distribution discrepancy between the training set and the true data. Through this approach, we establish that modifying the training direction of the neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Blind Source Separation Techniques
MethodsNatural Gradient Descent
