Similarity of samples and trimming
Pedro C. \'Alvarez-Esteban, Eustasio del Barrio, Juan A., Cuesta-Albertos, Carlos Matr\'an

TL;DR
This paper introduces a model for assessing the similarity of probability distributions based on contamination levels and explores how trimming affects empirical measures, proposing a bootstrap method for practical similarity testing.
Contribution
It establishes a connection between similarity of probabilities and minimal distances between trimmed probability sets, and develops a bootstrap approach for empirical similarity assessment.
Findings
Overfitting occurs when trimming exceeds the similarity level.
Empirical trimmed samples tend to be closer than expected.
Bootstrap method effectively assesses similarity from data samples.
Abstract
We say that two probabilities are similar at level if they are contaminated versions (up to an fraction) of the same common probability. We show how this model is related to minimal distances between sets of trimmed probabilities. Empirical versions turn out to present an overfitting effect in the sense that trimming beyond the similarity level results in trimmed samples that are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
