The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets
Noam Levi, Yaron Oz

TL;DR
This paper investigates universal statistical properties of complex datasets using tools from statistical physics and Random Matrix Theory, revealing common scaling laws and structures that can inform neural network analysis.
Contribution
It demonstrates that real-world and synthetic datasets share universal eigenvalue scaling laws and RMT properties, providing new insights into data structure and sample size requirements.
Findings
Eigenvalue power-law scalings differ between real and uncorrelated data.
Generated correlated data can model real-world eigenvalue behavior.
Real-world datasets and synthetic models share the same RMT universality class.
Abstract
We study universal traits which emerge both in real-world complex datasets, as well as in artificially generated ones. Our approach is to analogize data to a physical system and employ tools from statistical physics and Random Matrix Theory (RMT) to reveal their underlying structure. We focus on the feature-feature covariance matrix, analyzing both its local and global eigenvalue statistics. Our main observations are: (i) The power-law scalings that the bulk of its eigenvalues exhibit are vastly different for uncorrelated normally distributed data compared to real-world data, (ii) this scaling behavior can be completely modeled by generating Gaussian data with long range correlations, (iii) both generated and real-world datasets lie in the same universality class from the RMT perspective, as chaotic rather than integrable systems, (iv) the expected RMT statistical behavior already…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Neural Networks and Applications · Fractal and DNA sequence analysis
MethodsPruning · Focus
