Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning   and Hierarchical Merging

Zihan Wu; Zhaoke Huang; Hong Yan

arXiv:2410.18113·cs.DC·March 20, 2025

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

Zihan Wu, Zhaoke Huang, Hong Yan

PDF

Open Access

TL;DR

This paper introduces a scalable co-clustering approach for large datasets using dynamic partitioning and hierarchical merging, significantly reducing computation time while uncovering detailed data patterns.

Contribution

It presents a novel large matrix partitioning and hierarchical merging method that enhances scalability and robustness of co-clustering for high-dimensional data.

Findings

01

83% reduction in computation time for dense matrices

02

30% reduction in computation time for sparse matrices

03

Effective uncovering of intricate data patterns

Abstract

Co-clustering simultaneously clusters rows and columns, revealing more fine-grained groups. However, existing co-clustering methods suffer from poor scalability and cannot handle large-scale data. This paper presents a novel and scalable co-clustering method designed to uncover intricate patterns in high-dimensional, large-scale datasets. Specifically, we first propose a large matrix partitioning algorithm that partitions a large matrix into smaller submatrices, enabling parallel co-clustering. This method employs a probabilistic model to optimize the configuration of submatrices, balancing the computational efficiency and depth of analysis. Additionally, we propose a hierarchical co-cluster merging algorithm that efficiently identifies and merges co-clusters from these submatrices, enhancing the robustness and reliability of the process. Extensive evaluations validate the effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research