Hierarchical Clustering for Finding Symmetries and Other Patterns in   Massive, High Dimensional Datasets

Fionn Murtagh; Pedro Contreras

arXiv:1005.2638·stat.ML·March 17, 2015

Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

Fionn Murtagh, Pedro Contreras

PDF

Open Access

TL;DR

This paper explores hierarchical clustering techniques to identify symmetries and patterns in large, high-dimensional datasets, emphasizing their role in uncovering invariants and intrinsic data properties.

Contribution

It reviews the theoretical foundations of hierarchy, including ultrametric topology and algebraic structures, and demonstrates their application in analyzing complex real-world data.

Findings

01

Hierarchical clustering reveals symmetries in data.

02

Application in chemistry and finance shows effectiveness.

03

Hierarchies help identify invariants and intrinsic properties.

Abstract

Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsadvanced mathematical theories · Topological and Geometric Data Analysis · Data Management and Algorithms