Optimal Demixing of Nonparametric Densities

Jianqing Fan; Zheng Tracy Ke; Zhaoyang Shi

arXiv:2603.27457·math.ST·March 31, 2026

Optimal Demixing of Nonparametric Densities

Jianqing Fan, Zheng Tracy Ke, Zhaoyang Shi

PDF

TL;DR

This paper introduces a novel estimator for unmixing convex combinations of nonparametric densities, extending topic modeling to continuous variables with applications in machine learning and large language models.

Contribution

It proposes a weighted kernel density estimator with group-specific weights derived from topic modeling, achieving rate-optimal convergence for the problem.

Findings

01

The estimator achieves a convergence rate depending on sample size, number of components, and dimension.

02

A matching lower bound confirms the estimator's rate-optimality.

03

The method generalizes topic modeling to continuous data with theoretical guarantees.

Abstract

Motivated by applications in statistics and machine learning, we consider a problem of unmixing convex combinations of nonparametric densities. Suppose we observe $n$ groups of samples, where the $i$ th group consists of $N_{i}$ independent samples from a $d$ -variate density $f_{i} (x) = \sum_{k = 1}^{K} π_{i} (k) g_{k} (x)$ . Here, each $g_{k} (x)$ is a nonparametric density, and each $π_{i}$ is a $K$ -dimensional mixed membership vector. We aim to estimate $g_{1} (x), \dots, g_{K} (x)$ . This problem generalizes topic modeling from discrete to continuous variables and finds its applications in LLMs with word embeddings. In this paper, we propose an estimator for the above problem, which modifies the classical kernel density estimator by assigning group-specific weights that are computed by topic modeling on histogram vectors and de-biased by U-statistics. For any $β > 0$ , assuming that each $g_{k} (x)$ is in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.