Forest Density Estimation
Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry, Wasserman

TL;DR
This paper introduces a novel approach for high-dimensional density and graph estimation using forest-structured models, employing kernel density estimates and approximation algorithms, with theoretical guarantees and practical validation.
Contribution
It develops a new forest-based density estimator with oracle inequalities and proposes an approximation algorithm for restricted tree size forest estimation, including data-driven model selection.
Findings
Effective density estimation without assuming true forest structure
Approximation algorithm for NP-hard forest size restriction problem
Empirical results show practical advantages over Gaussian models
Abstract
We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Gene expression and cancer classification · Statistical Methods and Inference
