Testing Independence of Exchangeable Random Variables
Marcus Hutter

TL;DR
This paper develops statistical tests to determine whether exchangeable random variables are independent, addressing a key challenge in data analysis and deep learning where data often lacks independence.
Contribution
It introduces new tests for independence of exchangeable variables without structural assumptions, applicable to real-world data like internet-scraped datasets.
Findings
Tests can confidently reject independence in exchangeable data
High power for certain exchangeable distributions
Applicable to real-world data scenarios like deep learning
Abstract
Given well-shuffled data, can we determine whether the data items are statistically (in)dependent? Formally, we consider the problem of testing whether a set of exchangeable random variables are independent. We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed and have high power for (some) exchangeable distributions. We will make no structural assumptions on the underlying sample space. One potential application is in Deep Learning, where data is often scraped from the whole internet, with duplications abound, which can render data non-iid and test-set evaluation prone to give wrong answers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Bayesian Modeling and Causal Inference
