Efficient Correlation Clustering Methods for Large Consensus Clustering   Instances

Nathan Cordner; George Kollios

arXiv:2307.03818·cs.DS·July 11, 2023

Efficient Correlation Clustering Methods for Large Consensus Clustering Instances

Nathan Cordner, George Kollios

PDF

Open Access

TL;DR

This paper improves the efficiency of correlation clustering algorithms for large consensus clustering problems, making them more practical for large datasets and many input partitions, with theoretical and experimental validation.

Contribution

It provides run time and space complexity improvements for the Pivot algorithm and analyzes a sampling method to handle large numbers of input partitions.

Findings

01

Reduced Pivot's time complexity to O(|V|k)

02

Sampling methods yield quality results with smaller input sets

03

Algorithms perform well in practice even with limited samples

Abstract

Consensus clustering (or clustering aggregation) inputs $k$ partitions of a given ground set $V$ , and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either $k$ or $V$ gets large. In this paper we provide practical run time improvements for correlation clustering solvers when $V$ is large. We reduce the time complexity of Pivot from $O (∣ V ∣^{2} k)$ to $O (∣ V ∣ k)$ , and its space complexity from $O (∣ V ∣^{2})$ to $O (∣ V ∣ k)$ -- a significant savings since in practice $k$ is much less than $∣ V ∣$ . We also analyze a sampling method for these algorithms when $k$ is large, bridging the gap between running Pivot on the full set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Data Management and Algorithms