AuToMATo: An Out-Of-The-Box Persistence-Based Clustering Algorithm
Marius Huber, Sara Kalisnik, Patrick Schnider

TL;DR
AuToMATo is an out-of-the-box, persistence-based clustering algorithm that combines topological data analysis with bootstrapping, outperforming many existing methods without requiring parameter tuning.
Contribution
It introduces AuToMATo, a new clustering method that leverages persistent homology and bootstrapping, providing default parameters for robust, parameter-free clustering.
Findings
AuToMATo performs favorably against state-of-the-art algorithms.
It often outperforms well-tuned alternative algorithms.
The implementation is available in Python and integrates with scikit-learn.
Abstract
We present AuToMATo, a novel clustering algorithm based on persistent homology. While AuToMATo is not parameter-free per se, we provide default choices for its parameters that make it into an out-of-the-box clustering algorithm that performs well across the board. AuToMATo combines the existing ToMATo clustering algorithm with a bootstrapping procedure in order to separate significant peaks of an estimated density function from non-significant ones. We perform a thorough comparison of AuToMATo (with its parameters fixed to their defaults) against many other state-of-the-art clustering algorithms. We find not only that AuToMATo compares favorably against parameter-free clustering algorithms, but in many instances also significantly outperforms even the best selection of parameters for other algorithms. AuToMATo is motivated by applications in topological data analysis, in particular the…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
This paper is extremely clear, and presents a novel algorithm which successfully solves the problem of parameter selection in the ToMATo algorithm. The newly presented algorithm is easy to use and likely to be an effective drop-in replacement for the ToMATo algorithm.
The precise novel contribution of this paper is not completely clear. Given that the bottleneck bootstrap process was defined by Chazal et al. (2017), what is the key new insight that enables the AuToMATo algorithm to work? This should be made more clear in the write-up. The datasets used for the experimental evaluation seem to be generally quite small (< 10000 data points), low dimensional synthetic datasets. It would be interesting to see a comparison on some larger, real world datasets. For
The paper presents AuToMATo, a new and improved version of the topological clustering algorithm ToMATo. The authors provide a rigorous explanation of the algorithm, including detailed mathematical definitions. The paper is generally well-written. AuToMATo achieves competitive performance without the need for manual parameter tuning.
The experiments only use datasets from the Clustering Benchmarks suite. Including more high-dimensional and real-world datasets would better evaluate the AuToMATo algorithm's scalability and performance. The paper does not provide enough experiments and discussion comparing AuToMATo to other parameter-free clustering algorithms, which is necessary to demonstrate its effectiveness for this contribution. ->The paper does not sufficiently examine how changes in parameters affect the algorithm. Wh
S1) Easy to follow, written clearly. S2) Good idea and important concepts that are used S3) Good reasoning and background information behind the choices (for experiments as well as design-choices in the algorithm development)
W1) Experiments are not sufficient. a) Even though Fowlkes Mallows is a good evaluation measure, state-of-the-art papers for clustering usually include the NMI and ARI values, so please include at least one of them additionally in the appendix. b) Presentation of the experiments is hard to follow: In the main paper, only the average over all datasets is given, which is not enough to get an idea where the algorithm is good and where not. Instead of comparing AuToMATo with competitors individuall
- The paper presents an implementation of a hierarchical clustering ToMATo. - Some experiments and ablation study on ToMATo with Mapper, that approximates the Reeb graph of a manifold based on the sampled points, show some potentials of the implementation.
I feel that the novelty of the work is limited in the sense that the paper implements a clustering algorithm and presents some comparison with other clustering competitors. Regarding the clustering accuracy, the paper use FMI scores of clustering competitors though the improvement of ToMATo is quite maginal. It would be better to use other popular measures, including AMI or NMI, since these measures are less sensitive to different number of clusters and cluster sizes. Also, there should be the
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
