Fitting the BumpHunter test statistic distribution and global p-value estimation
Louis Vaslin, Samuel Calvet, Vincent Barra, Julien Donini

TL;DR
This paper introduces a new method to efficiently estimate the global p-value in the BumpHunter algorithm, reducing computational resources while maintaining accuracy in high-energy physics data analysis.
Contribution
It proposes a functional fitting approach to estimate the BumpHunter test statistic distribution, improving speed and efficiency over traditional pseudo-data sampling methods.
Findings
Achieves global significance estimation with about 5% precision up to 5σ
Reduces computational resources needed for significance calculation
Provides a practical alternative to extensive pseudo-data generation
Abstract
In high Energy Physics, it is common to look for a localized deviation in data with respect to a given reference. For this task, the well known BumpHunter algorithm allows for a model-independent deviation search with the advantage of estimating a global p-value to account for the Look Elsewhere Effect. However, this method relies on the generation and scan of thousands of pseudo-data histograms sampled from the reference background. Thus, accurately calculating a global significance of requires a lot of computing resources. In order to speed this process and improve the algorithm, we propose in this paper a solution to estimate the global p-value using a more reasonable number of pseudo-data histograms. This method uses a functional form inspired by similar statistical problems to fit the test statistic distribution. We have found that this alternative method allows to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data Technologies and Applications · Computational Physics and Python Applications · Data Analysis with R
