Massively-Parallel Heat Map Sorting and Applications To Explainable   Clustering

Sepideh Aghamolaei; Mohammad Ghodsi

arXiv:2309.07486·cs.DS·September 15, 2023

Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering

Sepideh Aghamolaei, Mohammad Ghodsi

PDF

Open Access

TL;DR

This paper introduces the heat map sorting problem, proves its NP-hardness, and provides parallel algorithms with empirical comparisons to clustering methods, enhancing explainable clustering techniques.

Contribution

It formalizes the heat map sorting problem, proves NP-hardness, and offers fixed-parameter and approximation algorithms suitable for massively parallel computation.

Findings

01

The problem is NP-hard.

02

The algorithms perform well in parallel settings.

03

Empirical results compare favorably with k-means and DBSCAN.

Abstract

Given a set of points labeled with $k$ labels, we introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. We prove the problem is NP-hard and we give a fixed-parameter algorithm with a constant number of rounds in the massively parallel computation model, where each machine has a sublinear memory and the total memory of the machines is linear. We give an approximation algorithm for a NP-hard special case of the problem. We empirically compare our algorithm with k-means and density-based clustering (DBSCAN) using a dimensionality reduction via locality-sensitive hashing on several directed and undirected graphs of email and computer networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Clustering Algorithms Research · Data Mining Algorithms and Applications