The Sparse Hausdorff Moment Problem, with Application to Topic Models
Spencer Gordon, Bijan Mazaheri, Leonard J. Schulman, Yuval Rabani

TL;DR
This paper presents an efficient algorithm for identifying finite mixture distributions from their first 2k moments, with applications to topic models, achieving near-optimal sample complexity and polynomial runtime.
Contribution
It introduces a novel method for the sparse Hausdorff moment problem that reduces sample complexity and runtime for mixture identification, improving upon prior approaches.
Findings
Achieves sample complexity of (1/w_min)^2 * (1/ζ)^{O(k)}
Runtime of O(k^{2+o(1)}) arithmetic operations
Requires moments known to additive accuracy w_min * ζ^{O(k)}
Abstract
We consider the problem of identifying, from its first noisy moments, a probability distribution on of support . This is equivalent to the problem of learning a distribution on observable binary random variables that are iid conditional on a hidden random variable taking values in . Our focus is on accomplishing this with , which is the minimum for which verifying that the source is a -mixture is possible (even with exact statistics). This problem, so simply stated, is quite useful: e.g., by a known reduction, any algorithm for it lifts to an algorithm for learning pure topic models. We give an algorithm for identifying a -mixture using samples of iid binary random variables using a sample of size and post-sampling runtime of only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Bayesian Methods and Mixture Models
