Testing Independence of Exchangeable Random Variables

Marcus Hutter

arXiv:2210.12392·math.ST·October 25, 2022

Testing Independence of Exchangeable Random Variables

Marcus Hutter

PDF

Open Access

TL;DR

This paper develops statistical tests to determine whether exchangeable random variables are independent, addressing a key challenge in data analysis and deep learning where data often lacks independence.

Contribution

It introduces new tests for independence of exchangeable variables without structural assumptions, applicable to real-world data like internet-scraped datasets.

Findings

01

Tests can confidently reject independence in exchangeable data

02

High power for certain exchangeable distributions

03

Applicable to real-world data scenarios like deep learning

Abstract

Given well-shuffled data, can we determine whether the data items are statistically (in)dependent? Formally, we consider the problem of testing whether a set of exchangeable random variables are independent. We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed and have high power for (some) exchangeable distributions. We will make no structural assumptions on the underlying sample space. One potential application is in Deep Learning, where data is often scraped from the whole internet, with duplications abound, which can render data non-iid and test-set evaluation prone to give wrong answers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference