Flow-based Algorithms for Improving Clusters: A Unifying Framework, Software, and Performance
K. Fountoulakis, M. Liu, D. F. Gleich, and M. W. Mahoney

TL;DR
This paper surveys flow-based algorithms for cluster improvement, providing a unifying framework, practical implementation in Python, and extensive experiments demonstrating their effectiveness in real-world data analysis tasks.
Contribution
It introduces a fractional programming framework for understanding and developing flow-based cluster improvement algorithms, along with a Python package and empirical validation.
Findings
Flow-based algorithms are powerful for cluster refinement.
The Python package enables practical application of these algorithms.
Numerical experiments show effectiveness on social and image data graphs.
Abstract
Clustering points in a vector space or nodes in a graph is a ubiquitous primitive in statistical data analysis, and it is commonly used for exploratory data analysis. In practice, it is often of interest to "refine" or "improve" a given cluster that has been obtained by some other method. In this survey, we focus on principled algorithms for this cluster improvement problem. Many such cluster improvement algorithms are flow-based methods, by which we mean that operationally they require the solution of a sequence of maximum flow problems on a (typically implicitly) modified data graph. These cluster improvement algorithms are powerful, both in theory and in practice, but they have not been widely adopted for problems such as community detection, local graph clustering, semi-supervised learning, etc. Possible reasons for this are: the steep learning curve for these algorithms; the lack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Image and Video Quality Assessment
