Unsupervised tree boosting for learning probability distributions
Naoki Awaya, Li Ma

TL;DR
This paper introduces an unsupervised tree boosting algorithm for estimating probability distributions, utilizing a novel notion of addition and residualization on distributions, applicable to both univariate and multivariate cases, and competitive with deep learning methods.
Contribution
It develops a new unsupervised boosting method with a novel concept of addition and residualization for probability distributions, extending to multivariate distributions with a new CDF definition.
Findings
Algorithm effectively reduces Kullback-Leibler divergence from the true distribution.
Provides an analytic form of the fitted density and a generative sampling method.
Performs competitively with state-of-the-art deep learning density estimators.
Abstract
We propose an unsupervised tree boosting algorithm for inferring the underlying sampling distribution of an i.i.d. sample based on fitting additive tree ensembles in a fashion analogous to supervised tree boosting. Integral to the algorithm is a new notion of "addition" on probability distributions that leads to a coherent notion of "residualization", i.e., subtracting a probability distribution from an observation to remove the distributional structure from the sampling distribution of the latter. We show that these notions arise naturally for univariate distributions through cumulative distribution function (CDF) transforms and compositions due to several "group-like" properties of univariate CDFs. While the traditional multivariate CDF does not preserve these properties, a new definition of multivariate CDF can restore these properties, thereby allowing the notions of "addition" and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference
