Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations
Chen Dan, Kristoffer Arnsfelt Hansen, He Jiang, Liwei Wang, Yuchen, Zhou

TL;DR
This paper investigates low rank approximation of binary matrices, focusing on column subset selection and generalizations, providing approximation bounds for different binary matrix models and introducing a generalized method for improved approximation.
Contribution
It characterizes the approximation ratios of column subset selection for binary matrices and introduces a generalized approach with bounds for both GF(2) and Boolean models.
Findings
CSS has bounded approximation ratio for GF(2) model.
CSS is insufficient for Boolean model, leading to GCSS.
GCSS achieves an approximation ratio bounded by 2^{k-1}+1.
Abstract
Low rank matrix approximation is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns and provides concise representations for the data. Research on low rank approximation usually focus on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrix. Here we are given a binary matrix and a small integer . The goal is to find two binary matrices and of sizes and respectively, so that the Frobenius norm of is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The model and the Boolean semiring model. Unlike low rank approximation of real matrix which can be efficiently solved by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
