VQ4ALL: Efficient Neural Network Representation via a Universal Codebook

Juncan Deng; Shuaiting Li; Zeyu Wang; Hong Gu; Kedong Xu; Kejie Huang

arXiv:2412.06875·cs.LG·December 11, 2024

VQ4ALL: Efficient Neural Network Representation via a Universal Codebook

Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang

PDF

Open Access

TL;DR

VQ4ALL introduces a universal codebook sharing approach for neural network compression, significantly reducing memory access and storage while maintaining high accuracy across various architectures.

Contribution

It proposes a bottom-up, universal codebook sharing method using VQ and kernel density estimation, improving compression efficiency and versatility over traditional layer-specific techniques.

Findings

01

Achieves over 16× compression rate while maintaining accuracy

02

Reduces memory access and chip area by storing static code tables in ROM

03

Demonstrates effectiveness across multiple neural network architectures

Abstract

The rapid growth of the big neural network models puts forward new requirements for lightweight network representation methods. The traditional methods based on model compression have achieved great success, especially VQ technology which realizes the high compression ratio of models by sharing code words. However, because each layer of the network needs to build a code table, the traditional top-down compression technology lacks attention to the underlying commonalities, resulting in limited compression rate and frequent memory access. In this paper, we propose a bottom-up method to share the universal codebook among multiple neural networks, which not only effectively reduces the number of codebooks but also further reduces the memory access and chip area by storing static code tables in the built-in ROM. Specifically, we introduce VQ4ALL, a VQ-based method that utilizes codewords to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate