Generative Feature Training of Thin 2-Layer Networks
Johannes Hertrich, Sebastian Neumayer

TL;DR
This paper introduces a novel method for training thin 2-layer neural networks by leveraging a learned generative model for initialization, combined with gradient refinement, to improve approximation quality on small datasets.
Contribution
The paper proposes a generative feature training approach that addresses local minima issues in non-convex optimization for small neural networks.
Findings
Effective initialization via learned generative models improves training outcomes.
Gradient-based post-processing enhances approximation accuracy.
Numerical examples demonstrate the method's practical benefits.
Abstract
We consider the approximation of functions by 2-layer neural networks with a small number of hidden weights based on the squared loss and small datasets. Due to the highly non-convex energy landscape, gradient-based training often suffers from local minima. As a remedy, we initialize the hidden weights with samples from a learned proposal distribution, which we parameterize as a deep generative model. To train this model, we exploit the fact that with fixed hidden weights, the optimal output weights solve a linear equation. After learning the generative model, we refine the sampled weights with a gradient-based post-processing in the latent space. Here, we also include a regularization scheme to counteract potential noise. Finally, we demonstrate the effectiveness of our approach by numerical examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and ELM · Advanced Computing and Algorithms
