Guided Layer-wise Learning for Deep Models using Side Information
Pavel Sulimov, Elena Sukmanova, Roman Chereshnev, and Attila, Kertesz-Farkas

TL;DR
This paper introduces diversifying regularization (DR), a novel technique that leverages side information to improve deep model training by mitigating local minima and vanishing gradients, leading to faster convergence and better generalization.
Contribution
The paper proposes a new regularization method, DR, that incorporates class label information into deep learning training, enhancing weight initialization and feature diversity.
Findings
DR helps mitigate vanishing gradient issues.
DR accelerates training convergence.
DR reduces generalization errors.
Abstract
Training of deep models for classification tasks is hindered by local minima problems and vanishing gradients, while unsupervised layer-wise pretraining does not exploit information from class labels. Here, we propose a new regularization technique, called diversifying regularization (DR), which applies a penalty on hidden units at any layer if they obtain similar features for different types of data. For generative models, DR is defined as divergence over the variational posteriori distributions and included in the maximum likelihood estimation as a prior. Thus, DR includes class label information for greedy pretraining of deep belief networks which result in a better weight initialization for fine-tuning methods. On the other hand, for discriminative training of deep neural networks, DR is defined as a distance over the features and included in the learning objective. With our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
