Clustering Tails in High Dimension

Liujun Chen; Marco Oesting; Chen Zhou

arXiv:2506.19414·stat.ME·June 25, 2025

Clustering Tails in High Dimension

Liujun Chen, Marco Oesting, Chen Zhou

PDF

TL;DR

This paper introduces an iterative clustering method for high-dimensional data based on extreme value indices, enabling effective grouping of variables with similar tail behaviors for better extreme value analysis.

Contribution

It proposes a novel iterative clustering algorithm that sequentially groups variables by their tail heaviness, with proven consistency and demonstrated effectiveness.

Findings

01

The algorithm accurately clusters variables with similar tail indices.

02

Simulation results show strong finite-sample performance.

03

Application to real data validates practical utility.

Abstract

One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for a multivariate dataset, we intend to group dimensions into clusters first, before applying any pooling techniques. This paper addresses the clustering problem for a high dimensional dataset, according to their extreme value indices. We propose an iterative clustering procedure that sequentially partitions the variables into groups, ordered from the heaviest-tailed to the lightesttailed distributions. At each step, our method identifies and extracts a group of variables that share the highest extreme value index among the remaining ones. This approach differs fundamentally from conventional clustering methods such as using pre-estimated extreme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.