Learning $k$-Modal Distributions via Testing
Constantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio

TL;DR
This paper presents a computationally efficient algorithm for learning $k$-modal distributions over discrete domains, nearly matching the information-theoretic sample complexity, and introduces a novel property testing approach as a key component.
Contribution
The paper introduces the first efficient algorithm for learning $k$-modal distributions with near-optimal sample complexity, utilizing a new property testing method for distribution decomposition.
Findings
Algorithm runs in polynomial time in $k$, $ ext{log}(n)$, and $1/\epsilon$.
Sample complexity is close to the information-theoretic lower bound for $k \leq \tilde{O}(\log n)$.
Uses a novel property testing algorithm to decompose distributions into near-monotone components.
Abstract
A -modal probability distribution over the discrete domain is one whose histogram has at most "peaks" and "valleys." Such distributions are natural generalizations of monotone () and unimodal () probability distributions, which have been intensively studied in probability theory and statistics. In this paper we consider the problem of \emph{learning} (i.e., performing density estimation of) an unknown -modal distribution with respect to the distance. The learning algorithm is given access to independent samples drawn from an unknown -modal distribution , and it must output a hypothesis distribution such that with high probability the total variation distance between and is at most Our main goal is to obtain \emph{computationally efficient} algorithms for this problem that use (close to) an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
