Biclustering and Boolean Matrix Factorization in Data Streams
Stefan Neumann, Pauli Miettinen

TL;DR
This paper introduces a novel streaming algorithm for bipartite graph clustering and Boolean matrix factorization that operates in a single pass with sublinear space, achieving fast performance and near-optimal quality.
Contribution
The authors present the first one-pass streaming algorithm for bipartite clustering with sublinear space and extend it to Boolean matrix factorization, supported by theoretical analysis.
Findings
Algorithm recovers right-side clusters after one pass
Achieves near-baseline quality on real data
Scales linearly with number of edges
Abstract
We study the clustering of bipartite graphs and Boolean matrix factorization in data streams. We consider a streaming setting in which the vertices from the left side of the graph arrive one by one together with all of their incident edges. We provide an algorithm that, after one pass over the stream, recovers the set of clusters on the right side of the graph using sublinear space; to the best of our knowledge, this is the first algorithm with this property. We also show that after a second pass over the stream, the left clusters of the bipartite graph can be recovered and we show how to extend our algorithm to solve the Boolean matrix factorization problem (by exploiting the correspondence of Boolean matrices and bipartite graphs). We evaluate an implementation of the algorithm on synthetic data and on real-world data. On real-world datasets the algorithm is orders of magnitudes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Stream Mining Techniques · Complex Network Analysis Techniques
