Unsupervised Learning of Initialization in Deep Neural Networks via   Maximum Mean Discrepancy

Cheolhyoung Lee; Kyunghyun Cho

arXiv:2302.04369·cs.LG·February 10, 2023

Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy

Cheolhyoung Lee, Kyunghyun Cho

PDF

Open Access

TL;DR

This paper introduces an unsupervised method to initialize deep neural networks by promoting diverse downstream tasks near the initial parameters, improving generalization especially with limited labeled data.

Contribution

It proposes a novel unsupervised algorithm based on Maximum Mean Discrepancy to find initial parameters that enhance learning across multiple tasks.

Findings

01

Improves average test accuracy on MNIST-based tasks.

02

More effective with fewer labeled examples.

03

Encourages diverse downstream tasks near initial parameters.

Abstract

Despite the recent success of stochastic gradient descent in deep learning, it is often difficult to train a deep neural network with an inappropriate choice of its initial parameters. Even if training is successful, it has been known that the initial parameter configuration may negatively impact generalization. In this paper, we propose an unsupervised algorithm to find good initialization for input data, given that a downstream task is d-way classification. We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification. We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters. We thus design an algorithm that encourages small perturbation to the initial parameter configuration leads to a diverse set of d-way…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsTest