Fast, Linear Time Hierarchical Clustering using the Baire Metric
Pedro Contreras, Fionn Murtagh

TL;DR
This paper introduces a linear-time hierarchical clustering method using the Baire metric, demonstrating its efficiency and effectiveness on large astronomical datasets compared to traditional methods.
Contribution
The work empirically evaluates a novel linear-time hierarchical clustering approach based on the Baire metric, comparing it with existing algorithms and applying it to large-scale astronomical data.
Findings
Baire-based clustering is faster than agglomerative methods.
The approach effectively predicts spectrometric redshifts from photometric data.
It performs well on large datasets with half a million objects.
Abstract
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
