Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement
Fionn Murtagh, Pedro Contreras

TL;DR
This paper introduces a fast, linear-time hierarchical clustering method using the Baire metric, enabling efficient search and retrieval in high-dimensional data by leveraging data measurement precision and random projections.
Contribution
It presents a novel m-adic hierarchical clustering approach based on the Baire metric that operates in linear time and links to various data analysis techniques.
Findings
Hierarchical clusters can be obtained in a single data pass.
The method is effective for high-dimensional and multidimensional data.
Insights into the role of data measurement precision in clustering.
Abstract
We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on practical implications of precision of data measurement. As a mechanism for treating multidimensional data, including very high dimensional data, we use random projections.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
