Lumberjack: Better Differentially Private Random Forests through Heavy Hitter Detection in Trees
Christian Janos Lebeda, David Erb, Tudor Cebere, Aur\'elien Bellet

TL;DR
Lumberjack introduces a differentially private random forest method that constructs deep trees with privacy-preserving pruning, significantly improving utility over prior DP approaches.
Contribution
A novel DP heavy hitter detection algorithm for hierarchical data enables deeper trees and better utility in private random forests.
Findings
Lumberjack outperforms prior DP random forest methods on benchmark datasets.
The new heavy hitter detection algorithm has error scaling as $O_{\varepsilon,\delta}(\sqrt{\log h})$.
Deeper trees lead to improved expressiveness and privacy-utility trade-offs.
Abstract
Random forests are widely used in fields involving sensitive tabular data, but existing approaches to enforcing differential privacy (DP) typically degrade performance to the point of impracticality. In this paper, we introduce Lumberjack, a differentially private random forest algorithm that achieves substantially higher utility by constructing large random decision trees and then applying aggressive, privacy-preserving pruning to retain only sufficiently populated nodes. A key component of our approach is a novel -DP heavy hitter detection algorithm for hierarchical data, whose error is for trees of height and may be of independent interest. This favorable scaling enables the use of significantly deeper trees than in prior work, leading to improved expressiveness under privacy constraints. Our empirical evaluation on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
