Generalization error bounds for iterative learning algorithms with bounded updates
Jingwen Fu, Nanning Zheng

TL;DR
This paper develops new information-theoretic bounds on the generalization error of iterative learning algorithms with bounded updates, especially for non-convex loss functions, enhancing understanding of their theoretical properties.
Contribution
It introduces a novel mutual information reformulation and a variance decomposition technique to derive tighter generalization bounds for bounded-update algorithms.
Findings
Improved generalization bounds under various settings
New perspective on mutual information as update uncertainty
Analysis of large language models' scaling behavior
Abstract
This paper explores the generalization characteristics of iterative learning algorithms with bounded updates for non-convex loss functions, employing information-theoretic techniques. Our key contribution is a novel bound for the generalization error of these algorithms with bounded updates. Our approach introduces two main novelties: 1) we reformulate the mutual information as the uncertainty of updates, providing a new perspective, and 2) instead of using the chaining rule of mutual information, we employ a variance decomposition technique to decompose information across iterations, allowing for a simpler surrogate process. We analyze our generalization bound under various settings and demonstrate improved bounds. To bridge the gap between theory and practice, we also examine the previously observed scaling behavior in large language models. Ultimately, our work takes a further step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
