TL;DR
SHADE introduces an information-theoretic regularization method for deep learning that improves classification performance by explicitly decoupling invariant representation learning from data fitting, with a practical stochastic implementation.
Contribution
The paper presents SHADE, a novel regularization scheme based on conditional entropy, and derives a stochastic version suitable for deep neural network training.
Findings
SHADE outperforms standard regularizers on multiple architectures.
The method effectively decouples invariant representation learning from data fitting.
Empirical results show improved classification accuracy.
Abstract
Regularization is a big issue for training deep neural networks. In this paper, we propose a new information-theory-based regularization scheme named SHADE for SHAnnon DEcay. The originality of the approach is to define a prior based on conditional entropy, which explicitly decouples the learning of invariant representations in the regularizer and the learning of correlations between inputs and labels in the data fitting term. Our second contribution is to derive a stochastic version of the regularizer compatible with deep learning, resulting in a tractable training scheme. We empirically validate the efficiency of our approach to improve classification performances compared to standard regularization schemes on several standard architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
