Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs
Micha{\l} Derezi\'nski

TL;DR
This paper introduces an efficient algorithmic framework for transforming data into nearly sub-gaussian random designs using sketching techniques, enabling the application of robust statistical guarantees to large datasets.
Contribution
It provides a novel, computationally efficient method to produce data sketches that mimic sub-gaussian distributions, extending statistical guarantees to a wider range of tasks.
Findings
Constructs nearly indistinguishable sub-gaussian sketches in near-linear time.
Enables direct application of statistical guarantees from sub-gaussian models.
Provides new approximation guarantees for sketched least squares.
Abstract
Algorithmic Gaussianization is a phenomenon that can arise when using randomized sketching or sampling methods to produce smaller representations of large datasets: For certain tasks, these sketched representations have been observed to exhibit many robust performance characteristics that are known to occur when a data sample comes from a sub-gaussian random design, which is a powerful statistical model of data distributions. However, this phenomenon has only been studied for specific tasks and metrics, or by relying on computationally expensive methods. We address this by providing an algorithmic framework for gaussianizing data distributions via averaging, proving that it is possible to efficiently construct data sketches that are nearly indistinguishable (in terms of total variation distance) from sub-gaussian random designs. In particular, relying on a recently introduced sketching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Advanced Multi-Objective Optimization Algorithms
