Semiparametric Exponential Families for Heavy-Tailed Data
William Fithian, Stefan Wager

TL;DR
This paper introduces a semiparametric approach to estimate the tail behavior of heavy-tailed data using a small sample and a larger related background sample, improving estimation accuracy.
Contribution
The paper develops a novel semiparametric method modeling the tail as an exponential tilt, leveraging background data to enhance tail estimation for heavy-tailed populations.
Findings
Estimator outperforms existing methods in simulations.
Significant efficiency gains demonstrated on Facebook data.
Method provides robust tail modeling with small samples.
Abstract
We propose a semiparametric method for fitting the tail of a heavy-tailed population given a relatively small sample from that population and a larger sample from a related background population. We model the tail of the small sample as an exponential tilt of the better-observed large-sample tail, using a robust sufficient statistic motivated by extreme value theory. In particular, our method induces an estimator of the small-population mean, and we give theoretical and empirical evidence that this estimator outperforms methods that do not use the background sample. We demonstrate substantial efficiency gains over competing methods in simulation and on data from a large controlled experiment conducted by Facebook.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Stochastic processes and statistical mechanics · Statistical Methods and Bayesian Inference
