Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy
Cheolhyoung Lee, Kyunghyun Cho

TL;DR
This paper introduces an unsupervised method to initialize deep neural networks by promoting diverse downstream tasks near the initial parameters, improving generalization especially with limited labeled data.
Contribution
It proposes a novel unsupervised algorithm based on Maximum Mean Discrepancy to find initial parameters that enhance learning across multiple tasks.
Findings
Improves average test accuracy on MNIST-based tasks.
More effective with fewer labeled examples.
Encourages diverse downstream tasks near initial parameters.
Abstract
Despite the recent success of stochastic gradient descent in deep learning, it is often difficult to train a deep neural network with an inappropriate choice of its initial parameters. Even if training is successful, it has been known that the initial parameter configuration may negatively impact generalization. In this paper, we propose an unsupervised algorithm to find good initialization for input data, given that a downstream task is d-way classification. We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification. We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters. We thus design an algorithm that encourages small perturbation to the initial parameter configuration leads to a diverse set of d-way…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsTest
