Information-Theoretic Understanding of Population Risk Improvement with Model Compression
Yuheng Bu, Weihao Gao, Shaofeng Zou, Venugopal V. Veeravalli

TL;DR
This paper demonstrates that model compression can enhance population risk by balancing reduced generalization error against increased empirical risk, supported by theoretical analysis and neural network experiments.
Contribution
It provides an information-theoretic framework showing how model compression acts as regularization to improve population risk, with practical insights for neural network compression.
Findings
Model compression reduces an information-theoretic bound on generalization error.
Population risk can be improved if generalization error decrease outweighs empirical risk increase.
Regularizing clustering centers enhances Hessian-weighted K-means compression.
Abstract
We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression. We first prove that model compression reduces an information-theoretic bound on the generalization error; this allows for an interpretation of model compression as a regularization technique to avoid overfitting. We then characterize the increase in empirical risk with model compression using rate distortion theory. These results imply that the population risk could be improved by model compression if the decrease in generalization error exceeds the increase in empirical risk. We show through a linear regression example that such a decrease in population risk due to model compression is indeed possible. Our theoretical results further suggest that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis
