Efficient Correlation Clustering Methods for Large Consensus Clustering Instances
Nathan Cordner, George Kollios

TL;DR
This paper improves the efficiency of correlation clustering algorithms for large consensus clustering problems, making them more practical for large datasets and many input partitions, with theoretical and experimental validation.
Contribution
It provides run time and space complexity improvements for the Pivot algorithm and analyzes a sampling method to handle large numbers of input partitions.
Findings
Reduced Pivot's time complexity to O(|V|k)
Sampling methods yield quality results with smaller input sets
Algorithms perform well in practice even with limited samples
Abstract
Consensus clustering (or clustering aggregation) inputs partitions of a given ground set , and seeks to create a single partition that minimizes disagreement with all input partitions. State-of-the-art algorithms for consensus clustering are based on correlation clustering methods like the popular Pivot algorithm. Unfortunately these methods have not proved to be practical for consensus clustering instances where either or gets large. In this paper we provide practical run time improvements for correlation clustering solvers when is large. We reduce the time complexity of Pivot from to , and its space complexity from to -- a significant savings since in practice is much less than . We also analyze a sampling method for these algorithms when is large, bridging the gap between running Pivot on the full set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Data Management and Algorithms
