Identity testing under label mismatch
Cl\'ement L. Canonne, Karl Wimmer

TL;DR
This paper studies the problem of identity testing when the observed data may be a permuted version of the true distribution, accounting for systematic label errors that can distort the data's conformity to a model.
Contribution
It introduces a framework for identity testing under permutation-based label mismatch, addressing a gap in traditional testing methods.
Findings
Develops new testing algorithms robust to label permutation errors
Provides theoretical bounds on testing accuracy under label mismatch
Demonstrates effectiveness through analysis and potential applications
Abstract
Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to capture many settings of interest; in this work, we focus on one such natural setting, identity testing under promise of permutation. In this setting, the unknown distribution is assumed to be equal to the purported one, up to a relabeling (permutation) of the model: however, due to a systematic error in the reporting of the data, this relabeling may not be the identity. The goal is then to test identity under this assumption: equivalently, whether this systematic labeling error led to a data distribution statistically far from the reference model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
