On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization
Zakhar Shumaylov, Vasileios Tsiaras, Yannis Stylianou

TL;DR
This paper applies information geometry to analyze and improve model compression techniques, emphasizing the importance of iterative methods and soft rank constraints for better performance and convergence.
Contribution
It introduces an information geometric perspective to understand model compression, proves convergence of iterative singular value thresholding, and suggests modifications for enhanced compression performance.
Findings
Information divergences are key for zero-shot accuracy in pre-trained models.
Iterative singular value thresholding converges under soft rank constraints.
Soft rank reduction improves performance at fixed compression rates.
Abstract
The ever-increasing parameter counts of deep learning models necessitate effective compression techniques for deployment on resource-constrained devices. This paper explores the application of information geometry, the study of density-induced metrics on parameter spaces, to analyze existing methods within the space of model compression, primarily focusing on operator factorization. Adopting this perspective highlights the core challenge: defining an optimal low-compute submanifold (or subset) and projecting onto it. We argue that many successful model compression approaches can be understood as implicitly approximating information divergences for this projection. We highlight that when compressing a pre-trained model, using information divergences is paramount for achieving improved zero-shot accuracy, yet this may no longer be the case when the model is fine-tuned. In such scenarios,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
