Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression
Tianhong Huang, Victor Agostinelli, Lizhong Chen

TL;DR
This paper introduces Partitioning-Guided K-Means, a novel method to effectively resolve empty clusters in iterative product quantization, significantly improving model accuracy and efficiency in extreme deep learning model compression.
Contribution
It presents a new partitioning-guided k-means approach with strategies for initial cluster assignment, empty cluster resolution, and cluster consolidation, enhancing quantization accuracy.
Findings
Reduces empty clusters by 100x on average
Uses 8x fewer iterations for empty cluster resolution
Improves model accuracy by up to 12% on GLUE tasks
Abstract
Compactness in deep learning can be critical to a model's viability in low-resource applications, and a common approach to extreme model compression is quantization. We consider Iterative Product Quantization (iPQ) with Quant-Noise to be state-of-the-art in this area, but this quantization framework suffers from preventable inference quality degradation due to prevalent empty clusters. In this paper, we propose several novel enhancements aiming to improve the accuracy of iPQ with Quant-Noise by focusing on resolving empty clusters. Our contribution, which we call Partitioning-Guided k-means (PG k-means), is a heavily augmented k-means implementation composed of three main components. First, we propose a partitioning-based pre-assignment strategy that ensures no initial empty clusters and encourages an even weight-to-cluster distribution. Second, we propose an empirically superior empty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Gaussian Processes and Bayesian Inference · Fault Detection and Control Systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Attention Dropout · WordPiece · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection
