VQ4ALL: Efficient Neural Network Representation via a Universal Codebook
Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang

TL;DR
VQ4ALL introduces a universal codebook sharing approach for neural network compression, significantly reducing memory access and storage while maintaining high accuracy across various architectures.
Contribution
It proposes a bottom-up, universal codebook sharing method using VQ and kernel density estimation, improving compression efficiency and versatility over traditional layer-specific techniques.
Findings
Achieves over 16× compression rate while maintaining accuracy
Reduces memory access and chip area by storing static code tables in ROM
Demonstrates effectiveness across multiple neural network architectures
Abstract
The rapid growth of the big neural network models puts forward new requirements for lightweight network representation methods. The traditional methods based on model compression have achieved great success, especially VQ technology which realizes the high compression ratio of models by sharing code words. However, because each layer of the network needs to build a code table, the traditional top-down compression technology lacks attention to the underlying commonalities, resulting in limited compression rate and frequent memory access. In this paper, we propose a bottom-up method to share the universal codebook among multiple neural networks, which not only effectively reduces the number of codebooks but also further reduces the memory access and chip area by storing static code tables in the built-in ROM. Specifically, we introduce VQ4ALL, a VQ-based method that utilizes codewords to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate
