Median of Forests for Robust Density Estimation
Hongwei Wen, Annika Betken, Tao Huang

TL;DR
This paper introduces MFRDE, a robust density estimation method using median of forests, which effectively resists all outliers and outperforms existing kernel-based methods in theory and practice.
Contribution
The paper proposes MFRDE, a novel ensemble learning algorithm that enhances robustness in density estimation by using median operations on forest estimators, allowing larger subsampling and better outlier resistance.
Findings
MFRDE achieves near-uncontaminated convergence rates even with many outliers.
It outperforms existing robust kernel methods in real data experiments.
MFRDE is effective in anomaly detection applications.
Abstract
Robust density estimation refers to the consistent estimation of the density function even when the data is contaminated by outliers. We find that existing forest density estimation at a certain point is inherently resistant to the outliers outside the cells containing the point, which we call \textit{non-local outliers}, but not resistant to the rest \textit{local outliers}. To achieve robustness against all outliers, we propose an ensemble learning algorithm called \textit{medians of forests for robust density estimation} (\textit{MFRDE}), which adopts a pointwise median operation on forest density estimators fitted on subsampled datasets. Compared to existing robust kernel-based methods, MFRDE enables us to choose larger subsampling sizes, sacrificing less accuracy for density estimation while achieving robustness. On the theoretical side, we introduce the local outlier exponent to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
