Optimal Demixing of Nonparametric Densities
Jianqing Fan, Zheng Tracy Ke, Zhaoyang Shi

TL;DR
This paper introduces a novel estimator for unmixing convex combinations of nonparametric densities, extending topic modeling to continuous variables with applications in machine learning and large language models.
Contribution
It proposes a weighted kernel density estimator with group-specific weights derived from topic modeling, achieving rate-optimal convergence for the problem.
Findings
The estimator achieves a convergence rate depending on sample size, number of components, and dimension.
A matching lower bound confirms the estimator's rate-optimality.
The method generalizes topic modeling to continuous data with theoretical guarantees.
Abstract
Motivated by applications in statistics and machine learning, we consider a problem of unmixing convex combinations of nonparametric densities. Suppose we observe groups of samples, where the th group consists of independent samples from a -variate density . Here, each is a nonparametric density, and each is a -dimensional mixed membership vector. We aim to estimate . This problem generalizes topic modeling from discrete to continuous variables and finds its applications in LLMs with word embeddings. In this paper, we propose an estimator for the above problem, which modifies the classical kernel density estimator by assigning group-specific weights that are computed by topic modeling on histogram vectors and de-biased by U-statistics. For any , assuming that each is in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
