Matrix Compression via Randomized Low Rank and Low Precision Factorization
Rajarshi Saha, Varun Srivastava, Mert Pilanci

TL;DR
This paper introduces a randomized low rank and low precision matrix factorization algorithm that effectively compresses large matrices, enabling significant storage reduction while maintaining or improving task performance.
Contribution
The paper presents a novel algorithm combining randomized sketching and quantization for low rank matrix approximation and compression, with theoretical error bounds and practical applications.
Findings
Achieves compression ratios as low as one bit per matrix element.
Maintains or surpasses performance of traditional methods in image and text tasks.
Effectively compresses large models like LlaMa-7b layers.
Abstract
Matrices are exceptionally useful in various fields of study as they provide a convenient framework to organize and manipulate data in a structured manner. However, modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage. Although prohibitively large, such matrices are often approximately low rank. We propose an algorithm that exploits this structure to obtain a low rank decomposition of any matrix as , where and are the low rank factors. The total number of elements in and can be significantly less than that in . Furthermore, the entries of and are quantized to low precision formats compressing by giving us a low rank and low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Image and Video Retrieval Techniques · Machine Learning and Algorithms
