Colouring and breaking sticks: random distributions and heterogeneous   clustering

Peter J. Green

arXiv:1003.3988·stat.ME·March 23, 2010

Colouring and breaking sticks: random distributions and heterogeneous clustering

Peter J. Green

PDF

Open Access

TL;DR

This paper explores probabilistic models based on the Dirichlet Process, introducing a heterogeneously coloured clustering framework that extends standard models and demonstrates its application to gene expression data.

Contribution

It proposes a novel mixture model with colour-based heterogeneity, extending Dirichlet Process clustering to handle clusters with different statistical characteristics.

Findings

01

Model generalization to heterogeneously coloured clusters

02

Adaptation of Dirichlet process machinery to new models

03

Application to gene expression profile clustering

Abstract

We begin by reviewing some probabilistic results about the Dirichlet Process and its close relatives, focussing on their implications for statistical modelling and analysis. We then introduce a class of simple mixture models in which clusters are of different `colours', with statistical characteristics that are constant within colours, but different between colours. Thus cluster identities are exchangeable only within colours. The basic form of our model is a variant on the familiar Dirichlet process, and we find that much of the standard modelling and computational machinery associated with the Dirichlet process may be readily adapted to our generalisation. The methodology is illustrated with an application to the partially-parametric clustering of gene expression profiles.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models