Efficient $1$-bit tensor approximations
Alex W. Neal Riasanovsky, Sarah El Kazdadi

TL;DR
This paper introduces a highly efficient method for approximating matrices and tensors using 1-bit valued vectors, enabling significant spatial compression with minimal error, and demonstrates its application to compressing large language model weights and images.
Contribution
The paper presents a novel 1-bit tensor decomposition technique that is simple, fast, and memory-efficient, extending previous work to tensors and practical large-scale applications.
Findings
Achieved 50% spatial compression of Mistral-7B model weights with less than 6% error.
The decomposition algorithm is simple, requiring only 20 lines of pseudocode.
Open source implementation optimized with SIMD instructions.
Abstract
We present a spatially efficient decomposition of matrices and arbitrary-order tensors as linear combinations of tensor products of -valued vectors. For any matrix , is a {\it -width signed cut decomposition of }. Here for some and , and the vectors are -valued. To store , we may pack bits, and require only floating point numbers. As a function of , exhibits exponential decay when applied to #f32 matrices with i.i.d. entries. Choosing so that has the same memory footprint as a \textit{f16} or \textit{bf16} matrix, the relative error is comparable.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Computational Physics and Python Applications
MethodsExponential Decay
