# Hybrid Wasserstein Distance and Fast Distribution Clustering

**Authors:** Isabella Verdinelli, Larry Wasserman

arXiv: 1812.11026 · 2018-12-31

## TL;DR

This paper introduces a modified Wasserstein distance for efficient distribution clustering, combining a closed-form location-scale measure with a tangent space approximation, enabling faster computation while maintaining accuracy.

## Contribution

It proposes a novel, computationally efficient distance measure that retains key properties of Wasserstein distance for distribution clustering.

## Key findings

- Effective on simulated data
- Works well on real-world examples
- Balances accuracy and computational speed

## Abstract

We define a modified Wasserstein distance for distribution clustering which inherits many of the properties of the Wasserstein distance but which can be estimated easily and computed quickly. The modified distance is the sum of two terms. The first term --- which has a closed form --- measures the location-scale differences between the distributions. The second term is an approximation that measures the remaining distance after accounting for location-scale differences. We consider several forms of approximation with our main emphasis being a tangent space approximation that can be estimated using nonparametric regression. We evaluate the strengths and weaknesses of this approach on simulated and real examples.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.11026/full.md

## Figures

33 figures with captions in the complete paper: https://tomesphere.com/paper/1812.11026/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1812.11026/full.md

---
Source: https://tomesphere.com/paper/1812.11026