Mixture models for data with unknown distributions
M. E. J. Newman

TL;DR
This paper introduces a flexible class of mixture models for multivariate data with unknown distributions, enabling simultaneous clustering and density estimation, and presents two fitting algorithms: EM and Bayesian non-parametric methods.
Contribution
It proposes a novel mixture modeling framework with arbitrary basis functions for unknown distributions and compares efficient EM and Bayesian approaches for fitting.
Findings
The EM algorithm provides fast point estimates.
The Bayesian method yields full posterior distributions and estimates of component number.
Applications demonstrate the models' ability to handle complex, non-Gaussian data.
Abstract
We describe and analyze a broad class of mixture models for real-valued multivariate data in which the probability density of observations within each component of the model is represented as an arbitrary combination of basis functions. Fits to these models give us a way to cluster data with distributions of unknown form, including strongly non-Gaussian or multimodal distributions, and return both a division of the data and an estimate of the distributions, effectively performing clustering and density estimation within each cluster at the same time. We describe two fitting methods, one using an expectation-maximization (EM) algorithm and the other a Bayesian non-parametric method using a collapsed Gibbs sampler. The former is numerically efficient, but gives only point estimates of the probability densities. The latter is more computationally demanding but returns a full Bayesian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Target Tracking and Data Fusion in Sensor Networks
