Low-Rank Thinning
Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

TL;DR
This paper introduces a low-rank analysis for sub-Gaussian thinning algorithms, broadening their applicability to various distributions and kernels, and demonstrates their effectiveness in machine learning tasks like attention approximation and data reordering.
Contribution
It provides a new low-rank theoretical framework for sub-Gaussian thinning applicable to any distribution and kernel, enhancing guarantees and practical performance.
Findings
Improved guarantees for attention approximation in transformers.
Accelerated stochastic gradient training through data reordering.
Near-linear time distribution distinction.
Abstract
The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSurface Modification and Superhydrophobicity · Icing and De-icing Technologies · Optical Coatings and Gratings
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
