Network Memory Footprint Compression Through Jointly Learnable Codebooks and Mappings
Edouard Yvinec, Arnaud Dapogny, Kevin Bailly

TL;DR
This paper introduces JLCM, a novel method for neural network compression that jointly learns codebooks and mappings, enabling significant memory reduction suitable for mobile devices.
Contribution
It proposes a joint learning approach for codebooks and mappings, addressing limitations of existing quantization methods and enabling efficient DNN compression.
Findings
Llama 7B compressed to 2GB
Achieves efficient approximation of DNNs
Enables deployment on old smartphones
Abstract
The massive interest in deep neural networks (DNNs) for both computer vision and natural language processing has been sparked by the growth in computational power. However, this led to an increase in the memory footprint, to a point where it can be challenging to simply load a model on commodity devices such as mobile phones. To address this limitation, quantization is a favored solution as it maps high precision tensors to a low precision, memory efficient format. In terms of memory footprint reduction, its most effective variants are based on codebooks. These methods, however, suffer from two limitations. First, they either define a single codebook for each tensor, or use a memory-expensive mapping to multiple codebooks. Second, gradient descent optimization of the mapping favors jumps toward extreme values, hence not defining a proximal search. In this work, we propose to address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Multimodal Machine Learning Applications
