Closed-Form Beta Distribution Estimation from Sparse Statistics with Random Forest Implicit Regularization
Jonathan R. Landers

TL;DR
This paper presents a closed-form method for estimating beta distributions from sparse data, improving ensemble classification accuracy and revealing implicit regularization effects in Random Forests, validated on ticket pricing and handwritten digit datasets.
Contribution
Introduces a novel closed-form estimator for beta distributions from limited statistics and links distributional accuracy to classification performance with implicit regularization insights.
Findings
Improved pairwise classification accuracy using recovered distributions.
Error bounds relating classification accuracy to distributional closeness.
Implicit regularization enhances tree diversity and predictor selection.
Abstract
This work advances distribution recovery from sparse data and ensemble classification through three main contributions. First, we introduce a closed-form estimator that reconstructs scaled beta distributions from limited statistics (minimum, maximum, mean, and median) via composite quantile and moment matching. The recovered parameters , when used as features in Random Forest classifiers, improve pairwise classification on time-series snapshots, validating the fidelity of the recovered distributions. Second, we establish a link between classification accuracy and distributional closeness by deriving error bounds that constrain total variation distance and Jensen-Shannon divergence, the latter exhibiting quadratic convergence. Third, we show that zero-variance features act as an implicit regularizer, increasing selection probability for mid-ranked predictors and producing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Market Behavior and Pricing
