Identity testing under label mismatch

Cl\'ement L. Canonne; Karl Wimmer

arXiv:2105.01856·math.ST·May 6, 2021

Identity testing under label mismatch

Cl\'ement L. Canonne, Karl Wimmer

PDF

TL;DR

This paper studies the problem of identity testing when the observed data may be a permuted version of the true distribution, accounting for systematic label errors that can distort the data's conformity to a model.

Contribution

It introduces a framework for identity testing under permutation-based label mismatch, addressing a gap in traditional testing methods.

Findings

01

Develops new testing algorithms robust to label permutation errors

02

Provides theoretical bounds on testing accuracy under label mismatch

03

Demonstrates effectiveness through analysis and potential applications

Abstract

Testing whether the observed data conforms to a purported model (probability distribution) is a basic and fundamental statistical task, and one that is by now well understood. However, the standard formulation, identity testing, fails to capture many settings of interest; in this work, we focus on one such natural setting, identity testing under promise of permutation. In this setting, the unknown distribution is assumed to be equal to the purported one, up to a relabeling (permutation) of the model: however, due to a systematic error in the reporting of the data, this relabeling may not be the identity. The goal is then to test identity under this assumption: equivalently, whether this systematic labeling error led to a data distribution statistically far from the reference model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.