Cluster and Feature Modeling from Combinatorial Stochastic Processes

Tamara Broderick; Michael I. Jordan; Jim Pitman

arXiv:1206.5862·math.ST·October 2, 2013

Cluster and Feature Modeling from Combinatorial Stochastic Processes

Tamara Broderick, Michael I. Jordan, Jim Pitman

PDF

TL;DR

This paper extends Bayesian nonparametric models from clustering to feature modeling, developing new stochastic process representations like the beta and Indian buffet processes to better understand data with multiple features.

Contribution

It introduces a formal framework for feature modeling, analogous to clustering, with new representations that clarify connections between existing stochastic processes.

Findings

01

Developed the beta process and Indian buffet process representations.

02

Established connections between clustering and feature modeling processes.

03

Provided a comprehensive treatment of Bayesian nonparametric feature modeling.

Abstract

One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet process and the Chinese restaurant process. In this paper we provide a formal development of an analogous problem, called feature modeling, for associating data points with arbitrary nonnegative integer numbers of groups, now called features or topics. We review the existing combinatorial stochastic process representations for the clustering problem and develop analogous representations for the feature modeling problem. These representations include the beta process and the Indian buffet process as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.