The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering
Fionn Murtagh

TL;DR
This paper demonstrates that very high dimensional data tend to have simple hierarchical structures, which can be effectively characterized using ultrametric topology, with applications in time series segmentation.
Contribution
It introduces a formal measure of ultrametricity to quantify hierarchical structure in high-dimensional data, revealing their inherent simplicity.
Findings
Ultrametricity increases with dimensionality and sparsity.
High-dimensional data exhibit pervasive hierarchical structure.
Applications include time series segmentation and modeling.
Abstract
An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity increases. This leads us to assert that very high dimensional data are of simple structure. We exemplify this finding through a range of simulated data cases. We discuss also application to very high frequency time series segmentation and modeling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
