# A Multivariate Extreme Value Theory Approach to Anomaly Clustering and   Visualization

**Authors:** Ma\"el Chiapino (LTCI), St\'ephan Cl\'emen\c{c}on (LTCI), Vincent, Feuillard, Anne Sabourin (LTCI)

arXiv: 1907.07523 · 2019-07-18

## TL;DR

This paper introduces a novel multivariate extreme value theory-based mixture model for anomaly clustering and visualization, enabling better identification and interpretation of extreme events in complex systems.

## Contribution

It proposes a new mixture model that treats anomaly types as latent variables, facilitating clustering and visualization of extreme observations.

## Key findings

- Effective clustering of extreme anomalies demonstrated on simulated data.
- Successful application to aeronautics data shows practical utility.
- Posterior probabilities provide meaningful similarity measures between anomalies.

## Abstract

In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X = (X1,. .. , X d) valued in R d , correspond to the simultaneous occurrence of extreme values for certain subgroups $\alpha$ $\subset$ {1,. .. , d} of variables Xj. Under the heavy-tail assumption, which is precisely appropriate for modeling these phenomena, statistical methods relying on multivariate extreme value theory have been developed in the past few years for identifying such events/subgroups. This paper exploits this approach much further by means of a novel mixture model that permits to describe the distribution of extremal observations and where the anomaly type $\alpha$ is viewed as a latent variable. One may then take advantage of the model by assigning to any extreme point a posterior probability for each anomaly type $\alpha$, defining implicitly a similarity measure between anomalies. It is explained at length how the latter permits to cluster extreme observations and obtain an informative planar representation of anomalies using standard graph-mining tools. The relevance and usefulness of the clustering and 2-d visual display thus designed is illustrated on simulated datasets and on real observations as well, in the aeronautics application domain.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.07523/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1907.07523/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/1907.07523/full.md

---
Source: https://tomesphere.com/paper/1907.07523