Dirichlet Fragmentation Processes
Hong Ge, Yarin Gal, Zoubin Ghahramani

TL;DR
This paper introduces the Dirichlet fragmentation process, a new probabilistic model for tree structures, combining Dirichlet processes with fragmentation theory, and demonstrates its effectiveness in hierarchical clustering and density estimation.
Contribution
The paper proposes the Dirichlet fragmentation process, a novel class of tree distributions with a stick-breaking construction, linking it to existing models like nCRP and demonstrating its practical advantages.
Findings
DFP outperforms existing models in hierarchical clustering
DFP provides better density modeling results
The model offers a new theoretical framework for tree-structured data
Abstract
Tree structures are ubiquitous in data across many domains, and many datasets are naturally modelled by unobserved tree structures. In this paper, first we review the theory of random fragmentation processes [Bertoin, 2006], and a number of existing methods for modelling trees, including the popular nested Chinese restaurant process (nCRP). Then we define a general class of probability distributions over trees: the Dirichlet fragmentation process (DFP) through a novel combination of the theory of Dirichlet processes and random fragmentation processes. This DFP presents a stick-breaking construction, and relates to the nCRP in the same way the Dirichlet process relates to the Chinese restaurant process. Furthermore, we develop a novel hierarchical mixture model with the DFP, and empirically compare the new model to similar models in machine learning. Experiments show the DFP mixture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Data Management and Algorithms · Stochastic processes and statistical mechanics
