Dissimilarity functions for rank-invariant hierarchical clustering of   continuous variables

Sebastian Fuchs; F. Marta L. Di Lascio; Fabrizio Durante

arXiv:2007.04799·stat.ME·February 4, 2021·Comput. Stat. Data Anal.

Dissimilarity functions for rank-invariant hierarchical clustering of continuous variables

Sebastian Fuchs, F. Marta L. Di Lascio, Fabrizio Durante

PDF

TL;DR

This paper introduces a copula-based dissimilarity measure for continuous random vectors, emphasizing its properties and suitability for hierarchical clustering, supported by simulations and real case studies.

Contribution

It proposes a novel dissimilarity function based on copulas that is rank-invariant and suitable for hierarchical clustering of continuous variables.

Findings

01

The dissimilarity measure is smallest for comonotonic vectors.

02

It possesses properties like reducibility relevant for hierarchical methods.

03

Simulation and real data demonstrate its effectiveness in clustering.

Abstract

A theoretical framework is presented for a (copula-based) notion of dissimilarity between continuous random vectors and its main properties are studied. The proposed dissimilarity assigns the smallest value to a pair of random vectors that are comonotonic. Various properties of this dissimilarity are studied, with special attention to those that are prone to the hierarchical agglomerative methods, such as reducibility. Some insights are provided for the use of such a measure in clustering algorithms and a simulation study is presented. Real case studies illustrate the main features of the whole methodology.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.