Distribution Compression in Near-linear Time

Abhishek Shetty; Raaz Dwivedi; Lester Mackey

arXiv:2111.07941·stat.ML·October 19, 2022

Distribution Compression in Near-linear Time

Abhishek Shetty, Raaz Dwivedi, Lester Mackey

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Compress++, a meta-procedure that accelerates distribution compression algorithms to near-linear time while maintaining accuracy, enabling efficient summarization of probability distributions in high-dimensional settings.

Contribution

Compress++ is a novel meta-algorithm that significantly speeds up existing thinning algorithms for distribution compression, achieving near-linear runtime with minimal error increase.

Findings

01

Compress++ reduces runtime of distribution compression algorithms to near-linear.

02

It maintains high accuracy with only a factor of 4 error increase.

03

Benchmarks show it matches or exceeds the accuracy of input algorithms in much less time.

Abstract

In distribution compression, one aims to accurately summarize a probability distribution $P$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $n$ points with $O (1/ n)$ discrepancy to $P$ . Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime in the sample size $n$ . To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor of $4$ in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $n$ points with $O (lo g n / n)$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $\mathcal{O}(n…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/goodpoints
jaxOfficial

Videos

Distribution Compression in Near-Linear Time· slideslive

Taxonomy

TopicsMathematical Approximation and Integration · Markov Chains and Monte Carlo Methods · Generative Adversarial Networks and Image Synthesis