DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression

Nobutaka Ono

arXiv:2605.05994·cs.LG·May 8, 2026

DiBA: Diagonal and Binary Matrix Approximation for Neural Network Weight Compression

Nobutaka Ono

PDF

TL;DR

DiBA introduces a novel matrix approximation method using diagonal and binary matrices to compress neural network weights, reducing storage and computation while maintaining accuracy.

Contribution

The paper proposes DiBA, a new matrix factorization technique with an efficient solver and retuning method for neural network weight compression.

Findings

01

DiBA improves SNR on 40 weight matrices from pretrained models.

02

DiBARD enhances accuracy in DistilBERT and Speech Commands tasks.

03

DiBA reduces floating-point multiplications from mn to m+k+n.

Abstract

In this paper, we propose DiBA (Diagonal and Binary Matrix Approximation), a compact matrix factorization for neural network weight compression. Many components of modern networks, including linear layers, $1 \times 1$ convolutions, attention projections, and embedding layers, have dense matrix weights. DiBA approximates $A \in R^{m \times n}$ by $A = D_{1} B_{1} D_{2} B_{2} D_{3}$ , where $D_{1}, D_{2}, D_{3}$ are diagonal matrices and $B_{1}, B_{2}$ are $0/1$ binary matrices. The intermediate dimension $k$ controls the trade-off between theoretical storage and approximation accuracy. For matrix-vector products, DiBA decomposes dense multiplication into three element-wise scaling operations and two binary mixing operations, reducing the floating-point multiplication count from $mn$ to $m + k + n$ . For optimization, we introduce DiBA-Greedy, an alternating solver that combines closed-form least-squares…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.