Random matrix approach to multivariate categorical data analysis
Aashay Patil, M. S. Santhanam

TL;DR
This paper introduces a random matrix framework for analyzing the spectral properties of similarity matrices derived from categorical data, applicable to social and environmental datasets, revealing universal statistical behaviors.
Contribution
It develops a novel random matrix approach for categorical data analysis, linking spectral properties to random matrix theory, and demonstrates its utility on real-world election and climate data.
Findings
Spectral properties of similarity matrices follow random matrix theory predictions.
Application to Indian election data reveals underlying correlation structures.
Sea level pressure data analysis shows universal spectral behavior.
Abstract
Correlation and similarity measures are widely used in all the areas of sciences and social sciences. Often the variables are not numbers but are instead qualitative descriptors called categorical data. We define and study similarity matrix, as a measure of similarity, for the case of categorical data. This is of interest due to a deluge of categorical data, such as movie ratings, top-10 rankings and data from social media, in the public domain that require analysis. We show that the statistical properties of the spectra of similarity matrices, constructed from categorical data, follow those from random matrix theory. We demonstrate this approach by applying it to the data of Indian general elections and sea level pressures in North Atlantic ocean.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
