DKM: Differentiable K-Means Clustering Layer for Neural Network   Compression

Minsik Cho; Keivan A. Vahid; Saurabh Adya; Mohammad Rastegari

arXiv:2108.12659·cs.LG·February 22, 2022·5 cites

DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

Minsik Cho, Keivan A. Vahid, Saurabh Adya, Mohammad Rastegari

PDF

Open Access 1 Video

TL;DR

This paper introduces DKM, a differentiable k-means clustering layer that enables effective weight clustering for neural network compression without altering the original loss or architecture.

Contribution

The authors propose a novel differentiable clustering layer that jointly optimizes DNN weights and centroids, improving compression and accuracy trade-offs over prior methods.

Findings

01

Achieves 74.5% top-1 accuracy on ImageNet with ResNet50 at 3.3MB size

02

Compresses MobileNet-v1 to 0.72MB with 63.9% accuracy, outperforming state-of-the-art

03

Reduces DistilBERT size by 11.8x with minimal accuracy loss on GLUE

Abstract

Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the DNN parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DKM: Differentiable k-Means Clustering Layer for Neural Network Compression· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Residual Connection · Adam · Dropout · Softmax · WordPiece · Layer Normalization