Repeated out of Sample Fusion in the Estimation of Small Tail Probabilities
Benjamin Kedem, Lemeng Pan, Paul Smith, and Chen Wang

TL;DR
This paper introduces a novel repeated out-of-sample data fusion method to improve estimation of very small tail probabilities in various fields, especially when data are scarce or do not contain extreme values.
Contribution
It proposes a new statistical approach that combines data with external random data to better estimate small tail probabilities, outperforming traditional extreme value methods.
Findings
The method provides reliable interval estimates with moderately large samples.
It outperforms the Peaks over Threshold (POT) method in simulations and real data.
The approach enhances small tail probability estimation in practical applications.
Abstract
Often, it is required to estimate the probability that a quantity such as toxicity level, plutonium, temperature, rainfall, damage, wind speed, wave size, earthquake magnitude, risk, etc., exceeds an unsafe high threshold. The probability in question is then very small. To estimate such a probability, information is needed about large values of the quantity of interest. However, in many cases, the data only contain values below or even far below the designated threshold, let alone exceedingly large values. It is shown that by repeated fusion of the data with externally generated random data, more information about small tail probabilities is obtained with the aid of certain new statistical functions. This provides relatively short, yet reliable interval estimates based on moderately large samples. A comparison of the approach with a method from extreme values theory (Peaks over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHydrology and Drought Analysis · Financial Risk and Volatility Modeling · Statistical and numerical algorithms
