Calculating $p$-values and their significances with the Energy Test for   large datasets

W. Barter; C. Burr; C. Parkes

arXiv:1801.05222·physics.data-an·April 19, 2018

Calculating $p$-values and their significances with the Energy Test for large datasets

W. Barter, C. Burr, C. Parkes

PDF

TL;DR

This paper introduces a new scalable method for calculating p-values in the energy test, enabling efficient analysis of large datasets by scaling distributions from smaller samples.

Contribution

It proposes a novel approach to determine the null distribution of the energy test statistic for large samples by scaling from small sample distributions, improving computational efficiency.

Findings

01

The distribution of the test statistic is not well modeled by the generalized extreme value function.

02

A new scaling method accurately estimates p-values for large datasets.

03

The method enhances the energy test's applicability to big data scenarios.

Abstract

The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the $T$ -value). The method has recently been used in particle physics to search for differences between samples that arise from CP violation. The generalised extreme value function has previously been used to describe the distribution of $T$ -values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of $T$ -values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.