TL;DR
This paper introduces Aurora, a nonparametric empirical Bayes method that uses order statistic regression on replicated noisy data to estimate effect sizes with near-optimal mean squared error, adaptable to heteroskedastic noise.
Contribution
It presents Aurora, a novel regression-based approach that achieves near-Bayes optimal effect size estimation without prior distribution assumptions, handling heteroskedastic noise.
Findings
Aurora matches the performance of classical estimators like James-Stein and median estimators.
The method is effective on large-scale Internet data.
Aurora automates effect size estimation in complex, real-world datasets.
Abstract
We study empirical Bayes estimation of the effect sizes of units from noisy observations on each unit. We show that it is possible to achieve near-Bayes optimal mean squared error, without any assumptions or knowledge about the effect size distribution or the noise. The noise distribution can be heteroskedastic and vary arbitrarily from unit to unit. Our proposal, which we call Aurora, leverages the replication inherent in the observations per unit and recasts the effect size estimation problem as a general regression problem. Aurora with linear regression provably matches the performance of a wide array of estimators including the sample mean, the trimmed mean, the sample median, as well as James-Stein shrunk versions thereof. Aurora automates effect size estimation for Internet-scale datasets, as we demonstrate on data from a large technology firm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
