Forest Density Estimation

Han Liu; Min Xu; Haijie Gu; Anupam Gupta; John Lafferty; Larry; Wasserman

arXiv:1001.1557·stat.ML·October 21, 2010·J. Mach. Learn. Res.·81 cites

Forest Density Estimation

Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry, Wasserman

PDF

Open Access

TL;DR

This paper introduces a novel approach for high-dimensional density and graph estimation using forest-structured models, employing kernel density estimates and approximation algorithms, with theoretical guarantees and practical validation.

Contribution

It develops a new forest-based density estimator with oracle inequalities and proposes an approximation algorithm for restricted tree size forest estimation, including data-driven model selection.

Findings

01

Effective density estimation without assuming true forest structure

02

Approximation algorithm for NP-hard forest size restriction problem

03

Empirical results show practical advantages over Gaussian models

Abstract

We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Gene expression and cancer classification · Statistical Methods and Inference