Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data
Skyler Ruiter, Seth Wolfgang, Marc Tunnell, Timothy Triche Jr., Erin Carrier, Zachary DeBruine

TL;DR
This paper introduces VCSC and IVCSC, two novel sparse matrix storage formats that exploit data redundancy to significantly reduce memory usage while maintaining efficient read performance, especially useful in machine learning applications.
Contribution
The paper presents two new sparse matrix formats, VCSC and IVCSC, that leverage data redundancy for improved compression over traditional formats like CSC and COO.
Findings
VCSC achieves up to 3-fold compression over COO.
IVCSC achieves up to 10-fold compression over COO.
Both formats enable reading compressed data with minimal computational overhead.
Abstract
Compressed Sparse Column (CSC) and Coordinate (COO) are popular compression formats for sparse matrices. However, both CSC and COO are general purpose and cannot take advantage of any of the properties of the data other than sparsity, such as data redundancy. Highly redundant sparse data is common in many machine learning applications, such as genomics, and is often too large for in-core computation using conventional sparse storage formats. In this paper, we present two extensions to CSC: (1) Value-Compressed Sparse Column (VCSC) and (2) Index- and Value-Compressed Sparse Column (IVCSC). VCSC takes advantage of high redundancy within a column to further compress data up to 3-fold over COO and 2.25-fold over CSC, without significant negative impact to performance characteristics. IVCSC extends VCSC by compressing index arrays through delta encoding and byte-packing, achieving a 10-fold…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsError Correcting Code Techniques · Algorithms and Data Compression · Caching and Content Delivery
