TURF: A Two-factor, Universal, Robust, Fast Distribution Learning   Algorithm

Yi Hao; Ayush Jain; Alon Orlitsky; Vaishakh Ravindrakumar

arXiv:2202.07172·stat.ML·June 22, 2022

TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm

Yi Hao, Ayush Jain, Alon Orlitsky, Vaishakh Ravindrakumar

PDF

Open Access

TL;DR

This paper introduces TURF, a fast and near-optimal distribution learning algorithm that achieves the best possible approximation bounds for a wide class of distributions, improving over existing methods.

Contribution

The paper presents a new distribution learning algorithm that attains the optimal approximation factor of 2 for all cases except the simplest, and provides a method to estimate the best polynomial complexity for practical distributions.

Findings

01

Achieves the optimal approximation factor of 2 for distribution learning.

02

Provides a near-linear-time, sample-efficient estimator.

03

Demonstrates improved empirical performance over existing algorithms.

Abstract

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $ℓ_{1}$ distance essentially at most a constant times larger than its closest $t$ -piece degree- $d$ polynomial, where $t \geq 1$ and $d \geq 0$ . Letting $c_{t, d}$ denote the smallest such factor, clearly $c_{1, 0} = 1$ , and it can be shown that $c_{t, d} \geq 2$ for all other $t$ and $d$ . Yet current computationally efficient algorithms show only $c_{t, 1} \leq 2.25$ and the bound rises quickly to $c_{t, d} \leq 3$ for $d \geq 9$ . We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t, d} = 2$ for all $(t, d) \neq = (1, 0)$ . Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression