On information captured by neural networks: connections with memorization and generalization
Hrayr Harutyunyan

TL;DR
This paper investigates how neural networks capture information during training, linking it to memorization and generalization, and introduces methods to analyze and improve understanding of these processes.
Contribution
It provides an information-theoretic framework for understanding neural network learning, including a new algorithm limiting label noise and insights into example informativeness.
Findings
Limits label noise information in weights during training
Defines a notion of unique sample information affecting training
Relates example informativeness to generalization bounds
Abstract
Despite the popularity and success of deep learning, there is limited understanding of when, how, and why neural networks generalize to unseen examples. Since learning can be seen as extracting information from data, we formally study information captured by neural networks during training. Specifically, we start with viewing learning in presence of noisy labels from an information-theoretic perspective and derive a learning algorithm that limits label noise information in weights. We then define a notion of unique information that an individual sample provides to the training of a deep network, shedding some light on the behavior of neural networks on examples that are atypical, ambiguous, or belong to underrepresented subpopulations. We relate example informativeness to generalization by deriving nonvacuous generalization gap bounds. Finally, by studying knowledge distillation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification
