Beta diffusion trees and hierarchical feature allocations
Creighton Heaukulani, David A. Knowles, Zoubin Ghahramani

TL;DR
This paper introduces the beta diffusion tree, a novel hierarchical model for feature allocations that allows objects to belong to multiple overlapping subsets, with applications demonstrated in gene expression and socioeconomic data analysis.
Contribution
The paper proposes the beta diffusion tree, a new generative model for hierarchical feature allocations with overlapping subsets, extending previous partition-based models like the Dirichlet diffusion tree.
Findings
Effective in modeling overlapping features in complex data
Demonstrated on gene expression and socioeconomic datasets
Enables hierarchical clustering with overlapping groups
Abstract
We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlapping subsets of objects, known as a feature allocation. A generative process for the tree structure is defined in terms of particles (representing the objects) diffusing in some continuous space, analogously to the Dirichlet diffusion tree (Neal, 2003), which defines a tree structure over partitions (i.e., non-overlapping subsets) of the objects. Unlike in the Dirichlet diffusion tree, multiple copies of a particle may exist and diffuse along multiple branches in the beta diffusion tree, and an object may therefore belong to multiple subsets of particles. We demonstrate how to build a hierarchically-clustered factor analysis model with the beta diffusion tree and how to perform inference over the random tree structures with a Markov chain Monte Carlo algorithm. We conclude…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gene expression and cancer classification · Bioinformatics and Genomic Networks
