Optimal minimization of the covariance loss
Vishesh Jain, Ashwin Sah, Mehtaab Sawhney

TL;DR
This paper proves an optimal bound for covariance approximation via partition-based sigma algebras, with implications for privacy-preserving data synthesis and connections to statistical physics.
Contribution
It establishes the best possible bound for covariance loss using partitions, introduces an efficient construction algorithm, and links the problem to the pinning lemma.
Findings
Bound rac{1}{\u221a{\u03bclog{k}}} for covariance approximation.
Provides an efficient algorithm for constructing the partition.
Improves accuracy guarantees for private synthetic data.
Abstract
Let be a random vector valued in such that almost surely. For every , we show that there exists a sigma algebra generated by a partition of into sets such that \[\|\operatorname{Cov}(X) - \operatorname{Cov}(\mathbb{E}[X\mid\mathcal{F}]) \|_{\mathrm{F}} \lesssim \frac{1}{\sqrt{\log{k}}}.\] This is optimal up to the implicit constant and improves on a previous bound due to Boedihardjo, Strohmer, and Vershynin. Our proof provides an efficient algorithm for constructing and leads to improved accuracy guarantees for -anonymous or differentially private synthetic data. We also establish a connection between the above problem of minimizing the covariance loss and the pinning lemma from statistical physics, providing an alternate (and much simpler) algorithmic proof in the important case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Credit Risk and Financial Regulations · Cryptography and Data Security
