The Perturbed Variation
Maayan Harel, Shie Mannor

TL;DR
The paper introduces a new discrepancy score called Perturbed Variation that measures the similarity between two distributions by optimally perturbing them, with efficient estimation, convergence bounds, and hypothesis testing procedures.
Contribution
It proposes the Perturbed Variation score for assessing distribution similarity, including estimation methods, theoretical bounds, and hypothesis testing procedures.
Findings
The score can be efficiently estimated from samples.
The hypothesis tests have proven statistical power.
The score outperforms existing measures in detecting similarity.
Abstract
We introduce a new discrepancy score between two distributions that gives an indication on their similarity. While much research has been done to determine if two samples come from exactly the same distribution, much less research considered the problem of determining if two finite samples come from similar distributions. The new score gives an intuitive interpretation of similarity; it optimally perturbs the distributions so that they best fit each other. The score is defined between distributions, and can be efficiently estimated from samples. We provide convergence bounds of the estimated score, and develop hypothesis testing procedures that test if two data sets come from similar distributions. The statistical power of this procedures is presented in simulations. We also compare the score's capacity to detect similarity with that of other known measures on real data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Forecasting Techniques and Applications · Data Stream Mining Techniques
