Using Ramsey theory to measure unavoidable spurious correlations in Big Data
Micheal Pawliuk, Michael Alexander Waddell

TL;DR
This paper applies Ramsey theory to quantify unavoidable patterns in datasets, providing a new way to measure dataset randomness and meaningfulness of patterns, with applications to political and economic data.
Contribution
It introduces a novel approach combining Ramsey theory with statistical tools to analyze dataset structure and pattern significance.
Findings
Quantifies unavoidable patterns in datasets using Ramsey theory.
Demonstrates dataset homogeneity and transitivity in political and economic data.
Provides evidence of transitivity in global markets.
Abstract
Given a dataset we quantify how many patterns must always exist in the dataset. Formally this is done through the lens of Ramsey theory of graphs, and a quantitative bound known as Goodman's theorem. Combining statistical tools with Ramsey theory of graphs gives a nuanced understanding of how far away a dataset is from random, and what qualifies as a meaningful pattern. This method is applied to a dataset of repeated voters in the 1984 US congress, to quantify how homogeneous a subset of congressional voters is. We also measure how transitive a subset of voters is. Statistical Ramsey theory is also used with global economic trading data to provide evidence that global markets are quite transitive.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Applications · Computability, Logic, AI Algorithms · Complex Systems and Time Series Analysis
