Thinning a Wishart Random Matrix
Ameer Dharamshi, Anna Neufeld, Lucy L. Gao, Daniela Witten, and Jacob Bien

TL;DR
This paper introduces a method to generate independent data matrices from only the sample mean and Wishart-distributed sample covariance, enabling data thinning without direct access to raw data.
Contribution
It provides the first thinning strategy for Wishart-distributed covariance matrices, allowing independent data generation from summary statistics.
Findings
Independent data matrices can be generated from sample mean and covariance.
The method preserves the original sample mean and covariance when recombined.
Enables privacy-preserving data analysis and validation without raw data access.
Abstract
Recent work has explored data thinning, a generalization of sample splitting that involves decomposing a (possibly matrix-valued) random variable into independent components. In the special case of a random matrix with independent and identically distributed rows, Dharamshi et al. (2024a) provides a comprehensive analysis of the settings in which thinning is or is not possible: briefly, if is unknown, then one can thin provided that . However, in some situations a data analyst may not have direct access to the data itself. For example, to preserve individuals' privacy, a data bank may provide only summary statistics such as the sample mean and sample covariance matrix. While the sample mean follows a Gaussian distribution, the sample covariance follows (up to scaling) a Wishart distribution, for which no thinning strategies have yet been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Mechanics and Entropy · Rough Sets and Fuzzy Logic
