Evolution of $K$-means solution landscapes with the addition of dataset outliers and a robust clustering comparison measure for their analysis
Luke Dicks, David J. Wales

TL;DR
This paper investigates how dataset outliers affect the solution landscape of K-means clustering using energy landscape analysis, revealing increased funneling complexity and proposing a robust clustering similarity measure based on kinetic pathways.
Contribution
It introduces an energy landscape approach to analyze K-means solutions with outliers and proposes a new outlier-robust clustering similarity measure based on kinetic analysis.
Findings
Solution landscape becomes more funnelled with outliers.
Shallow locally-funnelled regions correspond to different clustering solutions.
Kinetic-based similarity measure is robust to outliers.
Abstract
The -means algorithm remains one of the most widely-used clustering methods due to its simplicity and general utility. The performance of -means depends upon location of minima low in cost function, amongst a potentially vast number of solutions. Here, we use the energy landscape approach to map the change in -means solution space as a result of increasing dataset outliers and show that the cost function surface becomes more funnelled. Kinetic analysis reveals that in all cases the overall funnel is composed of shallow locally-funnelled regions, each of which are separated by areas that do not support any clustering solutions. These shallow regions correspond to different types of clustering solution and their increasing number with outliers leads to longer pathways within the funnel and a reduced correlation between accuracy and cost function. Finally, we propose that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Data Visualization and Analytics
