The Role of Information Complexity and Randomization in Representation Learning
Mat\'ias Vera, Pablo Piantanida, Leonardo Rey Vega

TL;DR
This paper investigates how information complexity and randomization techniques like Dropout influence the generalization ability of neural network encoders, providing theoretical bounds and empirical evidence on their roles.
Contribution
It introduces a sample-dependent bound on generalization gap based on information complexity and explores how regularization methods like Dropout affect encoder capacity.
Findings
Generalization gap correlates with information complexity in neural networks.
SGD implicitly minimizes information complexity during training.
Dropout reduces information complexity, improving generalization.
Abstract
A grand challenge in representation learning is to learn the different explanatory factors of variation behind the high dimen- sional data. Encoder models are often determined to optimize performance on training data when the real objective is to generalize well to unseen data. Although there is enough numerical evidence suggesting that noise injection (during training) at the representation level might improve the generalization ability of encoders, an information-theoretic understanding of this principle remains elusive. This paper presents a sample-dependent bound on the generalization gap of the cross-entropy loss that scales with the information complexity (IC) of the representations, meaning the mutual information between inputs and their representations. The IC is empirically investigated for standard multi-layer neural networks with SGD on MNIST and CIFAR-10 datasets; the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
MethodsDropout · Stochastic Gradient Descent
