DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression
Nobutaka Ono

TL;DR
DiBA introduces a novel matrix approximation method using diagonal and binary matrices to compress neural network weights, reducing storage and computation while maintaining accuracy.
Contribution
The paper proposes DiBA, a new matrix factorization technique with an efficient solver and retuning method for neural network weight compression.
Findings
DiBA improves SNR on 40 weight matrices from pretrained models.
DiBARD enhances accuracy in DistilBERT and Speech Commands tasks.
DiBA reduces floating-point multiplications from mn to m+k+n.
Abstract
In this paper, we propose DiBA (Diagonal and Binary Matrix Approximation), a compact matrix factorization for neural network weight compression. Many components of modern networks, including linear layers, convolutions, attention projections, and embedding layers, have dense matrix weights. DiBA approximates by , where are diagonal matrices and are binary matrices. The intermediate dimension controls the trade-off between theoretical storage and approximation accuracy. For matrix-vector products, DiBA decomposes dense multiplication into three element-wise scaling operations and two binary mixing operations, reducing the floating-point multiplication count from to . For optimization, we introduce DiBA-Greedy, an alternating solver that combines closed-form least-squares…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
