Combined Neyman-Pearson Chi-square: An Improved Approximation to the   Poisson-likelihood Chi-square

Xiangpan Ji; Wenqiang Gu; Xin Qian; Hanyu Wei; Chao Zhang

arXiv:1903.07185·physics.data-an·February 26, 2020

Combined Neyman-Pearson Chi-square: An Improved Approximation to the Poisson-likelihood Chi-square

Xiangpan Ji, Wenqiang Gu, Xin Qian, Hanyu Wei, Chao Zhang

PDF

1 Repo

TL;DR

This paper introduces a new approximation called combined Neyman-Pearson chi-square that reduces bias in parameter estimation compared to traditional chi-square methods, offering a computationally efficient alternative to the Poisson-likelihood chi-square.

Contribution

The paper proposes the combined Neyman-Pearson chi-square as an improved approximation to the Poisson-likelihood chi-square, with analytical and simulation validation.

Findings

01

$ ext{CNP}$ chi-square reduces bias in parameter estimates.

02

$ ext{CNP}$ provides a computationally efficient alternative.

03

Significant bias reduction compared to Neyman's or Pearson's chi-square.

Abstract

We describe an approximation to the widely-used Poisson-likelihood chi-square using a linear combination of Neyman's and Pearson's chi-squares, namely "combined Neyman-Pearson chi-square" ( $χ_{CNP}^{2}$ ). Through analytical derivations and toy model simulations, we show that $χ_{CNP}^{2}$ leads to a significantly smaller bias on the best-fit model parameters compared to those using either Neyman's or Pearson's chi-square. When the computational cost of using the Poisson-likelihood chi-square is high, $χ_{CNP}^{2}$ provides a good alternative given its natural connection to the covariance matrix formalism.

Figures14

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Comparison of the median 68% confidence intervals for the five test statistics: χ Poisson 2 subscript superscript 𝜒 2 Poisson \chi^{2}_{\mathrm{Poisson}} , χ Gauss 2 subscript superscript 𝜒 2 Gauss \chi^{2}_{\mathrm{Gauss}} , χ Neyman 2 subscript superscript 𝜒 2 Neyman \chi^{2}_{\mathrm{Neyman}} , χ Pearson 2 subscript superscript 𝜒 2 Pearson \chi^{2}_{\mathrm{Pearson}} , and χ CNP 2 subscript superscript 𝜒 2 CNP \chi^{2}_{\mathrm{CNP}} . 10 million toy experiments are generated with the n = 10 𝑛 10 n=10 and μ true = 15 subscript 𝜇 true 15 \mu_{\textrm{true}}=15 setting. For each toy experiment, a 68% confidence interval is obtained using the Feldman-Cousins approach. The reported lower limit μ ^ 1 / 2 − σ subscript superscript ^ 𝜇 𝜎 1 2 \hat{\mu}^{-\sigma}_{1/2} and upper limit μ ^ 1 / 2 + σ subscript superscript ^ 𝜇 𝜎 1 2 \hat{\mu}^{+\sigma}_{1/2} of the 68% confidence interval are the median values over all toy experiments.

	median 68% confidence interval	interval size
	$({\hat{μ}}_{1 / 2}^{- σ}, {\hat{μ}}_{1 / 2}^{+ σ})$	${\hat{μ}}_{1 / 2}^{+ σ} - {\hat{μ}}_{1 / 2}^{- σ}$
$χ_{Poisson}^{2}$	(13.839, 16.226)	2.387
$χ_{Gauss}^{2}$	(13.744, 16.221)	2.478
$χ_{Neyman}^{2}$	(12.236, 15.706)	3.471
$χ_{Pearson}^{2}$	(14.153, 16.800)	2.647
$χ_{CNP}^{2}$	(13.745, 16.196)	2.451

Equations93

L (μ (θ); M) = i \prod n \frac{e ^{- μ_{i}} μ _{i}^{M_{i}}}{M _{i} !} .

L (μ (θ); M) = i \prod n \frac{e ^{- μ_{i}} μ _{i}^{M_{i}}}{M _{i} !} .

λ (θ) = \frac{L ( μ ( θ ) ; M )}{max L ( μ ^{^{'}} ; M )} = \frac{L ( μ ( θ ) ; M )}{L ( M ; M )},

λ (θ) = \frac{L ( μ ( θ ) ; M )}{max L ( μ ^{^{'}} ; M )} = \frac{L ( μ ( θ ) ; M )}{L ( M ; M )},

χ_{Poisson}^{2} = - 2 ln λ (θ) = 2 i = 1 \sum n (μ_{i} (θ) - M_{i} + M_{i} ln \frac{M _{i}}{μ _{i} ( θ )}) .

χ_{Poisson}^{2} = - 2 ln λ (θ) = 2 i = 1 \sum n (μ_{i} (θ) - M_{i} + M_{i} ln \frac{M _{i}}{μ _{i} ( θ )}) .

L_{Gauss} (μ (θ); M) = i \prod \frac{1}{2 π μ _{i} ( θ )} exp (- \frac{( μ _{i} ( θ ) - M _{i} ) ^{2}}{2 μ _{i} ( θ )}) .

L_{Gauss} (μ (θ); M) = i \prod \frac{1}{2 π μ _{i} ( θ )} exp (- \frac{( μ _{i} ( θ ) - M _{i} ) ^{2}}{2 μ _{i} ( θ )}) .

λ_{Gauss} (θ) = \frac{L _{Gauss} ( μ ( θ ) ; M )}{max L _{Gauss} ( μ ^{^{'}} ; M )},

λ_{Gauss} (θ) = \frac{L _{Gauss} ( μ ( θ ) ; M )}{max L _{Gauss} ( μ ^{^{'}} ; M )},

χ_{Gauss}^{2} = - 2 ln λ_{Gauss} (θ)

χ_{Gauss}^{2} = - 2 ln λ_{Gauss} (θ)

with μ_{i}^{^{'}}

χ_{Pearson}^{2} = i \sum \frac{( μ _{i} ( θ ) - M _{i} ) ^{2}}{μ _{i} ( θ )} .

χ_{Pearson}^{2} = i \sum \frac{( μ _{i} ( θ ) - M _{i} ) ^{2}}{μ _{i} ( θ )} .

χ_{Neyman}^{2} = i \sum \frac{( μ _{i} ( θ ) - M _{i} ) ^{2}}{M _{i}} .

χ_{Neyman}^{2} = i \sum \frac{( μ _{i} ( θ ) - M _{i} ) ^{2}}{M _{i}} .

χ_{cov}^{2} = (M - μ (θ))^{T} \cdot V^{- 1} \cdot (M - μ (θ)),

χ_{cov}^{2} = (M - μ (θ))^{T} \cdot V^{- 1} \cdot (M - μ (θ)),

χ_{Poisson}^{2}

χ_{Poisson}^{2}

χ_{Neyman}^{2}

χ_{Pearson}^{2}

\overset{μ}{^}_{Poisson} = \frac{\sum _{i = 1}^{n} M _{i}}{n}, \overset{μ}{^}_{Neyman} = \frac{n}{\sum _{i = 1}^{n} \frac{1}{M _{i}}}, \overset{μ}{^}_{Pearson} = \frac{\sum _{i = 1}^{n} M _{i}^{2}}{n} .

\overset{μ}{^}_{Poisson} = \frac{\sum _{i = 1}^{n} M _{i}}{n}, \overset{μ}{^}_{Neyman} = \frac{n}{\sum _{i = 1}^{n} \frac{1}{M _{i}}}, \overset{μ}{^}_{Pearson} = \frac{\sum _{i = 1}^{n} M _{i}^{2}}{n} .

χ_{Poisson}^{2}

χ_{Poisson}^{2}

\approx i = 1 \sum n [\frac{( μ - M _{i} ) ^{2}}{M _{i}} - \frac{2}{3} \frac{( μ - M _{i} ) ^{3}}{M _{i}^{2}} + O (\frac{( μ - M _{i} ) ^{4}}{M _{i}^{3}})] .

χ_{Poisson}^{2} - χ_{Neyman}^{2}

χ_{Poisson}^{2} - χ_{Neyman}^{2}

χ_{Poisson}^{2} - χ_{Pearson}^{2}

χ_{CNP}^{2} \equiv \frac{1}{3} (χ_{Neyman}^{2} + 2 χ_{Pearson}^{2}) = i = 1 \sum n \frac{( μ - M _{i} ) ^{2}}{3/ ( \frac{1}{M _{i}} + \frac{2}{μ} )},

χ_{CNP}^{2} \equiv \frac{1}{3} (χ_{Neyman}^{2} + 2 χ_{Pearson}^{2}) = i = 1 \sum n \frac{( μ - M _{i} ) ^{2}}{3/ ( \frac{1}{M _{i}} + \frac{2}{μ} )},

\overset{μ}{^}_{CNP} = 3 \frac{\sum _{i = 1}^{n} M _{i}^{2}}{\sum _{i = 1}^{n} \frac{1}{M _{i}}} = 3 \overset{μ}{^}_{Pearson}^{2} \cdot \overset{μ}{^}_{Neyman},

\overset{μ}{^}_{CNP} = 3 \frac{\sum _{i = 1}^{n} M _{i}^{2}}{\sum _{i = 1}^{n} \frac{1}{M _{i}}} = 3 \overset{μ}{^}_{Pearson}^{2} \cdot \overset{μ}{^}_{Neyman},

χ_{CNP}^{2} = i = 1 \sum n \frac{( μ _{i} ( θ , η ) - M _{i} ) ^{2}}{3/ ( \frac{1}{M _{i}} + \frac{2}{μ _{i} ( θ , η )} )} + m = 1 \sum K \frac{η _{m}^{2}}{σ _{m}^{2}},

χ_{CNP}^{2} = i = 1 \sum n \frac{( μ _{i} ( θ , η ) - M _{i} ) ^{2}}{3/ ( \frac{1}{M _{i}} + \frac{2}{μ _{i} ( θ , η )} )} + m = 1 \sum K \frac{η _{m}^{2}}{σ _{m}^{2}},

V_{ij} = V_{ij}^{stat} + V_{ij}^{syst}, V_{ij}^{syst} = m \sum K σ_{m}^{2} s_{mi} s_{mj} .

V_{ij} = V_{ij}^{stat} + V_{ij}^{syst}, V_{ij}^{syst} = m \sum K σ_{m}^{2} s_{mi} s_{mj} .

(χ_{CNP}^{2})_{cov} = (M - μ (θ))^{T} \cdot (V_{CNP}^{stat} (θ) + V^{syst})^{- 1} \cdot (M - μ (θ)),

(χ_{CNP}^{2})_{cov} = (M - μ (θ))^{T} \cdot (V_{CNP}^{stat} (θ) + V^{syst})^{- 1} \cdot (M - μ (θ)),

V_{CNP}^{stat} (θ)_{ij} \equiv 3/ (\frac{1}{M _{i}} + \frac{2}{μ _{i} ( θ )}) δ_{ij} .

V_{CNP}^{stat} (θ)_{ij} \equiv 3/ (\frac{1}{M _{i}} + \frac{2}{μ _{i} ( θ )}) δ_{ij} .

χ_{Poisson}^{2} =

χ_{Poisson}^{2} =

+ 2 d = 1 \sum 100 i = 1 \sum 16 (b_{d}^{i} - B_{d}^{i} + B_{d}^{i} ln \frac{B _{d}^{i}}{b _{d}^{i}}) + d = 1 \sum 100 (\frac{ϵ _{d}}{0.02})^{2},

χ_{CNP}^{2} = d = 1 \sum 100 i = 1 \sum 16 \frac{( μ _{d}^{i} ( 1 + ϵ + ϵ _{d} ) + b _{d}^{i} - M _{d}^{i} ) ^{2}}{3/ ( \frac{1}{M _{d}^{i}} + \frac{2}{μ _{d}^{i} ( 1 + ϵ + ϵ _{d} ) + b _{d}^{i}} )} + d = 1 \sum 100 i = 1 \sum 16 \frac{( b _{d}^{i} - B _{d}^{i} ) ^{2}}{3/ ( \frac{1}{B _{d}^{i}} + \frac{2}{b _{d}^{i}} )} + d = 1 \sum 100 (\frac{ϵ _{d}}{0.02})^{2} .

χ_{CNP}^{2} = d = 1 \sum 100 i = 1 \sum 16 \frac{( μ _{d}^{i} ( 1 + ϵ + ϵ _{d} ) + b _{d}^{i} - M _{d}^{i} ) ^{2}}{3/ ( \frac{1}{M _{d}^{i}} + \frac{2}{μ _{d}^{i} ( 1 + ϵ + ϵ _{d} ) + b _{d}^{i}} )} + d = 1 \sum 100 i = 1 \sum 16 \frac{( b _{d}^{i} - B _{d}^{i} ) ^{2}}{3/ ( \frac{1}{B _{d}^{i}} + \frac{2}{b _{d}^{i}} )} + d = 1 \sum 100 (\frac{ϵ _{d}}{0.02})^{2} .

χ_{CNP}^{2} =

χ_{CNP}^{2} =

+ d = 1 \sum 10 (\frac{ϵ _{d}}{0.02})^{2} .

(χ_{CNP}^{2})_{cov} =

(χ_{CNP}^{2})_{cov} =

+ (b - B)^{T} \cdot (V_{CNP}^{bkg})^{- 1} \cdot (b - B),

χ_{CNP}^{^{'} 2} =

χ_{CNP}^{^{'} 2} =

+ d = 1 \sum 10 (\frac{ϵ _{d}}{0.02})^{2}

(χ_{CNP}^{^{'} 2})_{cov} =

(χ_{CNP}^{^{'} 2})_{cov} =

\cdot (μ (1 + ϵ) + R \cdot B - M),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wouter-VDP/nuecc_python
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Combined Neyman–Pearson Chi-square: An Improved Approximation to the Poisson-likelihood Chi-square

Xiangpan Ji

Wenqiang Gu

Xin Qian

Hanyu Wei

Chao Zhang

Physics Department, Brookhaven National Laboratory, Upton, NY, USA

Abstract

We describe an approximation to the widely-used Poisson-likelihood chi-square using a linear combination of Neyman’s and Pearson’s chi-squares, namely “combined Neyman–Pearson chi-square” ( $\chi^{2}_{\mathrm{CNP}}$ ). Through analytical derivations and toy model simulations, we show that $\chi^{2}_{\mathrm{CNP}}$ leads to a significantly smaller bias on the best-fit model parameters compared to those using either Neyman’s or Pearson’s chi-square. When the computational cost of using the Poisson-likelihood chi-square is high, $\chi^{2}_{\mathrm{CNP}}$ provides a good alternative given its natural connection to the covariance matrix formalism.

keywords:

test statistics, Poisson-likelihood chi-square, Neyman’s chi-square, Pearson’s chi-square

††journal: Nuclear Instruments and Methods A

1 Introduction

In high-energy physics experiments, it is often convenient to bin the data into a histogram with $n$ bins. The number of measured events $M_{i}$ in each bin typically follows a Poisson distribution with the mean value $\mu_{i}({\bm{\theta}})$ predicted by a set of model parameters ${\bm{\theta}}=(\theta_{1},...,\theta_{N})$ . The likelihood function of this Poisson histogram can be written as:

[TABLE]

A maximum-likelihood estimator (MLE) of ${\bm{\theta}}$ can be constructed by maximizing the likelihood ratio [1, 2]

[TABLE]

where the denominator is a model-independent constant that maximizes the likelihood of the data without any restriction on the model111While the estimation of model parameters ${\bm{\theta}}$ does not depend on the denominator of the likelihood ratio, the chi-square test statistic constructed in this way, such as that in Eq. (3), can be used to examine the data-model compatibility with a goodness-of-fit test.. Maximizing this likelihood ratio is equivalent to minimizing the Poisson-likelihood chi-square function [3, 4]:

[TABLE]

The MLE is commonly used in the high-energy physics, as it is generally an asymptotically unbiased estimator, and has the advantage of being consistent and efficient [5].

At large statistics, the previous Poisson distribution can be approximated by a normal (or Gaussian) distribution with mean $\mu_{i}({\bm{\theta}})$ and variance $\sigma_{i}^{2}=\mu_{i}({\bm{\theta}})$ . The likelihood then becomes:

[TABLE]

The Gauss-MLE can be similarly constructed through a likelihood ratio:

[TABLE]

where the denominator is the maximum of $L_{\textrm{Gauss}}$ without any restriction on the model, and can be derived by calculating $\partial{L_{\textrm{Gauss}}}/\partial{\mu_{i}^{{}^{\prime}}}=0$ . Maximizing $\lambda_{\textrm{Gauss}}({\bm{\theta}})$ is equivalent to minimizing the Gauss-likelihood chi-square function

[TABLE]

While the Gauss-likelihood chi-square is relatively well-known (see e.g. [6, 7]) 222We further provide some relevant formulas for the Gauss-likelihood chi-square in D., interestingly, it is not widely used in high-energy physics experiments. Instead, a direct chi-square test statistic, namely the Pearson’s chi-square, is constructed through:

[TABLE]

Comparing with Eq. (6), we see $\chi^{2}_{\mathrm{Pearson}}$ consists of only the first term in $\chi^{2}_{\mathrm{Gauss}}$ . These two chi-squares become asymptotically equivalent when $M_{i}$ is large.

In practice, the variance $\sigma_{i}^{2}$ is often approximated by the measured value $M_{i}$ , which is independent of the model parameters. This leads to another popular chi-square test statistic in high-energy physics experiments, namely the Neyman’s chi-square:

[TABLE]

Comparing to the MLE from the Poisson-likelihood chi-square, it is known that the estimator of model parameters constructed from Pearson’s or Neyman’s chi-square leads to biases especially when the large-statistics condition is not met [4, 8, 9]. Despite this shortcoming, both $\chi^{2}_{\mathrm{Pearson}}$ and $\chi^{2}_{\mathrm{Neyman}}$ are commonly used in physics data analysis, partly because of their close connection to the covariance-matrix formalism:

[TABLE]

where $V_{ij}=\mathrm{cov}[\mu_{i},\mu_{j}]$ is the covariance matrix of the prediction, and can often be calculated through Monte Carlo methods based on the statistical and systematic uncertainties of the experiment prior to the minimization of $\chi^{2}_{\mathrm{cov}}$ . In situations where many nuisance parameters [5] are required in the likelihood function $L$ as in Eq. (1), the covariance matrix format Eq. (9) has a natural advantage of reducing the number of nuisance parameters, thus leads to a faster minimization of the $\chi^{2}$ function.

One method to remove the bias of the estimator from $\chi^{2}_{\mathrm{Pearson}}$ is through an iteration of the weighted least-squares fit, where the variance in one round of $\chi^{2}_{\mathrm{Pearson}}$ minimization is replaced by the prediction from the best-fit value in the previous round of iteration [10, 11, 12]. Several modified chi-square test statistics have also been proposed in past literatures to mitigate the bias issue. For example, $\chi^{2}_{\mathrm{Gauss}}$ defined in Eq. (6) is a good replacement of $\chi^{2}_{\mathrm{Pearson}}$ when the number of measurements is large. Similarly, $\chi^{2}_{\gamma}$ as proposed by Mighell [13] is a good alternative to $\chi^{2}_{\mathrm{Neyman}}$ when the number of measurements is large. Both $\chi^{2}_{Gauss}$ and $\chi^{2}_{\gamma}$ , however, still lead to biases when the number of measurements is small. Redin proposed a solution by including a cubic term in $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ [14], or by reporting a weighted average of fitting results from $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ [15].

In this paper, we propose a new method through the construction of a chi-square test statistic ( $\chi^{2}_{\mathrm{CNP}}$ ) with a linear combination of Neyman’s and Pearson’s chi-squares. As an improved approximation to the Poisson-likelihood chi-square with respect to either Neyman’s or Pearson’s chi-square, the $\chi^{2}_{\mathrm{CNP}}$ significantly reduces the bias while keeping the advantage of the covariance matrix formalism. This paper is organized as follows. The construction of $\chi^{2}_{\mathrm{CNP}}$ and its covariance matrix format is described in Sec. 2. Three toy examples are presented in Sec. 3 to illustrate the features and advantages of $\chi^{2}_{\mathrm{CNP}}$ . Finally, we summarize the recommended usage in data analysis of counting experiments in Sec. 4.

2 Combined Neyman–Pearson Chi-square ( $\chi^{2}_{\mathrm{CNP}}$ )

The bias in the estimator of model parameters ${\bf\bm{\theta}}$ using Neyman’s or Pearson’s chi-square can be traced back to the different $\chi^{2}$ definitions in approximating the Poisson-likelihood chi-square. To illustrate this, we start with a simple example. A set of $n$ independent counting experiments were performed to measure a common expected value $\mu$ . Each experiment measured $M_{i}$ events. The three chi-square functions in this case are 333The treatment for bins where $M_{i}=0$ is described in A.:

[TABLE]

$\hat{\mu}$ (the estimator of $\mu$ ) can be calculated through the minimization of Eq. (10): $\partial\chi^{2}/\partial\mu=0$ . We obtain:

[TABLE]

Given Eq. (11), it is straightforward to show that $\hat{\mu}_{\mathrm{Neyman}}\leq\hat{\mu}_{\mathrm{Poisson}}\leq\hat{\mu}_{\mathrm{Pearson}}$ , where the equal sign is only established when all values of $M_{i}$ are the same. Since $\hat{\mu}_{\mathrm{Poisson}}$ is unbiased in this simple example, we see that $\hat{\mu}_{\mathrm{Pearson}}$ and $\hat{\mu}_{\mathrm{Neyman}}$ are biased in the opposite directions.

We further examine the difference in chi-square values. Assuming that $M_{i}$ and $\mu$ are reasonably large so that $M_{i}$ is close to $\mu$ , a Taylor expansion of $\chi^{2}_{\mathrm{Poisson}}$ yields:

[TABLE]

From Eq. (12), it is straightforward to deduce:

[TABLE]

Naturally, we can define a new chi-square function as a linear combination of Neyman’s and Pearson’s chi-squares:

[TABLE]

which is approximately equal to $\chi^{2}_{\mathrm{Poisson}}$ up to $O(\frac{(\mu-M_{i})^{4}}{M_{i}^{3}})$ , better than either $\chi^{2}_{\mathrm{Neyman}}$ or $\chi^{2}_{\mathrm{Pearson}}$ alone. In this example, the estimator $\hat{\mu}$ from minimizing $\chi^{2}_{\mathrm{CNP}}$ can be derived as:

[TABLE]

which is the geometric mean of two $\hat{\mu}_{\mathrm{Pearson}}$ and one $\hat{\mu}_{\mathrm{Neyman}}$ . Since the bias of $\hat{\mu}_{\mathrm{Pearson}}$ and $\hat{\mu}_{\mathrm{Neyman}}$ are in the opposite directions, it is easy to see that $\hat{\mu}_{\mathrm{CNP}}$ has a reduced bias.

More generally, when model parameters and systematic uncertainties are included, the $\chi^{2}_{\mathrm{CNP}}$ can be written as:

[TABLE]

where ${\bf\bm{\theta}}=\{\theta_{k}|k=1,...,N\}$ are model parameters, and ${\bf\bm{\eta}}=\{\eta_{m}|m=1,...,K\}$ are nuisance parameters representing systematic uncertainties constrained with their corresponding standard deviations ( $\sigma_{m}$ ). As an improved approximation to $\chi^{2}_{\mathrm{Poisson}}$ , $\chi^{2}_{\mathrm{CNP}}$ in Eq. (16) will naturally lead to a reduced bias in estimating model parameters ${\bf\bm{\theta}}$ , such as the normalization or the shape of the histograms, than using $\chi^{2}_{\mathrm{Neyman}}$ or $\chi^{2}_{\mathrm{Pearson}}$ .

It is worth noting that in $\chi^{2}_{\mathrm{CNP}}$ , the variance of the Gaussian distribution for the $i$ th bin is approximated as $3/(\frac{1}{M_{i}}+\frac{2}{\mu_{i}})$ , while for $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ they are $M_{i}$ and $\mu_{i}$ , respectively. From this we can further deduce the covariance matrix format of the $\chi^{2}_{\mathrm{CNP}}$ . Following Ref. [16], when $\mu_{i}$ can be approximated as being linearly dependent on nuisance parameters: $\mu_{i}=\mu_{i}^{0}+\sum_{m}^{K}\eta_{m}s_{mi}$ , the chi-square format with pull terms (e.g. Eq. 16) is equivalent to the chi-square in the covariance matrix format (Eq. 9). In this case, the covariance matrix $V$ can be written as

[TABLE]

Therefore, the covariance matrix format of $\chi^{2}_{\mathrm{CNP}}$ becomes:

[TABLE]

where

[TABLE]

Note that in Eq. (19) we have approximated $\mu_{i}(\bm{\theta},\bm{\eta})\approx\mu_{i}(\bm{\theta})$ by fixing the nuisance parameters at their externally constrained (i.e. nominal) values. This is necessary because the above derivation requires that uncertainties must be independent of the nuisance parameters $\bm{\eta}$ [16].

While the biases of Neyman’s and Pearson’s chi-squares are well-known [4, 8, 9], the construction of $\chi^{2}_{\mathrm{CNP}}$ is, interestingly, new. This could be partially caused by the fact that in low-statistics experiments where the use of $\chi^{2}_{\mathrm{Neyman}}$ or $\chi^{2}_{\mathrm{Pearson}}$ leads to a high bias, the Poisson-likelihood chi-square is generally used instead. $\chi^{2}_{\mathrm{CNP}}$ , however, provides certain advantages in situations where either the number of nuisance parameters is too high, or the likelihood function is analytically difficult to write. In the next section, we demonstrate the features and advantages of $\chi^{2}_{\mathrm{CNP}}$ with three toy examples of increasing complexity. Before that, below we briefly discuss the expected performance of $\chi^{2}_{\mathrm{CNP}}$ regarding two other common properties of a test statistic: the goodness of fit and the interval estimation.

2.1 Goodness of fit

In a goodness-of-fit test, the test statistic (e.g. $\chi^{2}_{\mathrm{Poisson}}$ ) is evaluated at the estimator $\hat{\mu}$ (i.e. the best-fit value of $\mu$ ). Assuming its distribution following a chi-square distribution with the corresponding number of degrees of freedom, a p-value can be calculated to evaluate the compatibility between the data and the model. Although $\chi^{2}_{\mathrm{CNP}}$ can be used to perform such a test, it does not hold a particular advantage over the preferred choice of $\chi^{2}_{\mathrm{Pearson}}$ [6]. As shown in Fig. 3 in Sec. 3.1, the distributions of $\chi^{2}_{\mathrm{Poisson}}$ , $\chi^{2}_{\mathrm{Gauss}}$ , $\chi^{2}_{\mathrm{Neyman}}$ , $\chi^{2}_{\mathrm{Pearson}}$ , and $\chi^{2}_{\mathrm{CNP}}$ all deviate from the ideal chi-square distribution at low values of $\mu_{\mathrm{true}}$ , while $\chi^{2}_{\mathrm{Pearson}}$ deviates the least. In addition, the mean of the $\chi^{2}_{\mathrm{Pearson}}$ distribution equals to the number of degrees of freedom at all $\mu_{\mathrm{true}}$ ’s. Therefore, following Ref. [6], we recommend to use $\chi^{2}_{\mathrm{Pearson}}$ together with the least-biased estimator $\hat{\mu}$ (from e.g. $\chi^{2}_{\mathrm{Poisson}}$ or $\chi^{2}_{\mathrm{CNP}}$ ) to perform the goodness-of-fit test.

2.2 Interval estimation

It is well known that the construction of confidence intervals in the frequentist approach not only depends on the choice of test statistics $T$ , but also on its actual procedure. Within the high-energy physics community, there are two popular procedures in setting the confidence intervals, which we describe below.

The first procedure is based on the Wilks’ theorem [17]. The confidence interval is set by placing a certain threshold $c$ on the distribution of $\Delta T\left(\mu\right)=T\left(\mu\right)-T_{min}$ , where $\mu$ , $T\left(\mu\right)$ , and $T_{min}$ are the parameter of interest, the test statistic evaluated at $\mu$ , and the global minimum of the $T\left(\mu\right)$ for all model parameters, respectively. Under the conditions that i) the two hypotheses are nested, ii) the parameters of the larger hypothesis (e.g. $T\left(\mu\right)$ ) are all uniquely defined in the smaller hypothesis (e.g. $T_{min}$ ), and not on the limits of the allowed region, and iii) data are asymptotic, Wilks proves that the negative-two-log-likelihood-ratio test statistic $\Delta T$ follows a chi-square distribution and the estimator $\hat{\mu}$ follows a normal distribution centered around the true value $\mu_{\textrm{true}}$ . Consequently, the threshold $c$ can be conveniently calculated. For instance, the threshold $c$ for the 68%, 95%, and 99.7% confidence intervals are 1, 4, and 9, respectively, assuming $\Delta T$ follows a chi-square distribution with one degree of freedom. With this procedure, the correctness of the confidence interval coverage depends on the validity of the Wilks’ theorem. As demonstrated in Eq. (13) and Eq. (14), $\chi^{2}_{\mathrm{CNP}}$ is an improved approximation to the negative-two-log-likelihood-ratio of the Poisson distribution (i.e. $\chi^{2}_{\mathrm{Poisson}}$ ), and it leads to a reduced bias in the estimator $\hat{\mu}$ compared to those from $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ . Therefore, the conditions of the Wilks’ theorem are better met with $\chi^{2}_{\mathrm{CNP}}$ , which means the the chi-square distribution is a better approximation to the $\Delta T$ distribution from $\chi^{2}_{\mathrm{CNP}}$ . Fig. 5 in Sec. 3.1 shows one such example. Consequently, we expect a more proper coverage of the confidence interval using $\chi^{2}_{\mathrm{CNP}}$ when compared to those using $\chi^{2}_{\mathrm{Neyman}}$ or $\chi^{2}_{\mathrm{Pearson}}$ under this procedure.

The second procedure is commonly referred to as the Feldman-Cousins approach [18] in the high-energy physics community. In this procedure, the construction of the confidence interval strictly follows a frequentist definition (Neyman construction) with an ordering principle based on the value of the likelihood-ratio test statistic (i.e. $\Delta T\left(\mu\right)=T\left(\mu\right)-T_{min}$ with $T=\chi^{2}_{\mathrm{Poisson}}$ for counting experiments) to ensure a proper frequentist coverage. Sec. 3.1 shows an example of this procedure with a toy experiment. Similarly, the procedure can be defined with an ordering principle based on other $\Delta T$ test statistics (e.g. $T=\chi^{2}_{\mathrm{Neyman}}$ , $\chi^{2}_{\mathrm{Pearson}}$ , or $\chi^{2}_{\mathrm{CNP}}$ ), and the constructed confidence intervals would also have proper coverages in general. In this case, while all of the coverages are proper, a better test statistic is expected to yield a smaller confidence interval in size (or area, volume). As shown in Table. 1 of Sec. 3.1, the confidence interval constructed using $\chi^{2}_{\mathrm{CNP}}$ is smaller than those using $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ . This is partially caused by the reduced bias in the estimator $\hat{\mu}$ using $\chi^{2}_{\mathrm{CNP}}$ , as will be further discussed in Sec.3.1.

We should note that there are other procedures to set confidence intervals that are less affected by certain properties of the test statistics. For example, since the bias ( $\delta\mu$ ) of an estimator $\hat{\mu}$ can be evaluated with a Monte Carlo method, one can define an alternative test statistic with $\Delta T^{\prime}\left(\mu\right)=T\left(\mu+\delta\mu\right)-T_{min}$ . Naturally, the confidence interval constructed using $\Delta T^{\prime}$ with either the thresholding approach based on the Wilks’ theorem or the Feldman-Cousins approach would be less affected by the bias, and performs better than that of $\Delta T$ at the cost of increased computation.

3 Performance of $\chi^{2}_{\mathrm{CNP}}$

In this section, we compare the performance of $\chi^{2}_{\mathrm{Poisson}}$ , $\chi^{2}_{\mathrm{Neyman}}$ , $\chi^{2}_{\mathrm{Pearson}}$ , and $\chi^{2}_{\mathrm{CNP}}$ with three toy examples. While we focus on the issue of bias, we also provide comparison results of the goodness-of-fit test and the interval estimation in the first example to support the discussion in Sec. 2.1 and Sec. 2.2. For completeness of the discussion, we add $\chi^{2}_{\mathrm{Gauss}}$ , which has a similar performance to $\chi^{2}_{\mathrm{CNP}}$ in certain scenarios, to the comparison in the first example.

3.1 Example 1: simple counting

The first example is similar to the one introduced in Sec. 2. In each toy experiment, a set of $n$ independent counting measurements were performed to measure a common expected value $\mu$ . The $\chi^{2}$ curves with $n=10$ and $\mu_{\mathrm{true}}=15$ of one simulated toy experiment is shown in the left panel of Fig. 1. The minimum location of the $\chi^{2}$ curve represents the estimator $\hat{\mu}$ . It is clear that $\hat{\mu}_{\mathrm{Neyman}}<\hat{\mu}_{\mathrm{CNP}}\approx\hat{\mu}_{\mathrm{Poisson}}\approx\hat{\mu}_{\mathrm{Gauss}}<\hat{\mu}_{\mathrm{Pearson}}$ and the CNP chi-square curve closely resembles the Poisson-likelihood chi-square as demonstrated in the previous section.

The relative biases of $\hat{\mu}$ using $\chi^{2}_{\mathrm{Poisson}}$ , $\chi^{2}_{\mathrm{Neyman}}$ , $\chi^{2}_{\mathrm{Pearson}}$ , $\chi^{2}_{\mathrm{Gauss}}$ and $\chi^{2}_{\mathrm{CNP}}$ are shown in the right panel of Fig. 1 with 10 million toy experiments. The bias using $\chi^{2}_{\mathrm{Poisson}}$ is zero. The biases using $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ have opposite signs. The magnitude of mean bias using $\chi^{2}_{\mathrm{Neyman}}$ is about twice of that using $\chi^{2}_{\mathrm{Pearson}}$ . The bias using $\chi^{2}_{\mathrm{CNP}}$ is an order of magnitude smaller than those using $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ . The bias using $\chi^{2}_{\mathrm{Gauss}}$ is similar to $\chi^{2}_{\mathrm{CNP}}$ . The variance of $\hat{\mu}_{\mathrm{Neyman}}$ is notably larger than those of the other four test statistics, which are similar.

In Fig. 2, we further study the biases of $\hat{\mu}$ with different values of $\mu_{\mathrm{true}}$ and the number of measurements $n$ . The biases using $\chi^{2}_{\mathrm{Poisson}}$ are always zero as expected from an unbiased estimator in this simple example. The biases using $\chi^{2}_{\mathrm{Neyman}}$ , $\chi^{2}_{\mathrm{Pearson}}$ , and $\chi^{2}_{\mathrm{CNP}}$ become larger as the number of measurements $n$ increases. This behavior may not be intuitive, but is well known and the proof is provided in B. As $\mu$ and $n$ increases, the biases of $\hat{\mu}_{\mathrm{Pearson}}$ and $\hat{\mu}_{\mathrm{Neyman}}$ approach $1/2$ and $-1$ , respectively. Beside these observations, the general features of the biases stay the same as discussed previously. Most importantly, $\chi^{2}_{\mathrm{CNP}}$ yields a much smaller bias than $\chi^{2}_{\mathrm{Neyman}}$ or $\chi^{2}_{\mathrm{Pearson}}$ in all occasions.

Figure 2 also shows the performance of $\chi^{2}_{\mathrm{Gauss}}$ , which is another way to mitigate the bias issue. Similar to $\chi^{2}_{\mathrm{CNP}}$ , $\chi^{2}_{\mathrm{Gauss}}$ performs much better than $\chi^{2}_{\mathrm{Neyman}}$ or $\chi^{2}_{\mathrm{Pearson}}$ . We note that the bias of $\hat{\mu}_{\mathrm{Gauss}}$ is less dependent on $\mu$ , and becomes smaller when $n$ increases. This is expected from the central limit theorem, which states that the sum of a large number of identically distributed random variables follows a normal distribution. Therefore, when the number of measurements is large, $\chi^{2}_{\mathrm{Gauss}}$ provides a better performance even when $\mu$ is small. On the other hand, when number of measurements is not large, $\chi^{2}_{\mathrm{CNP}}$ shows a better performance.

Next, we compare the performance on the goodness-of-fit test. The left panel of Fig. 3 shows the distribution of the five test statistics evaluated at $\mu_{\mathrm{true}}=15$ in the $n=10$ setting with 10 million toy experiments. The ideal chi-square distribution with 10 degrees of freedom is also shown for comparison. All five test statistics deviate from the ideal chi-square distribution, with $\chi^{2}_{\mathrm{Pearson}}$ being the closest and $\chi^{2}_{\mathrm{Neyman}}$ deviating the most. The mean of $\chi^{2}_{\mathrm{Pearson}}$ is exactly 10, and the mean of $\chi^{2}_{\mathrm{Neyman}}$ is the largest. The right panel of Fig. 3 shows the relative deviation of the mean to the number of degrees of freedom ( $\mathrm{ndf}=10$ in all toy experiments) for the five test statistics as a function of $\mu_{\mathrm{true}}$ . It is clear that except for $\chi^{2}_{\mathrm{Pearson}}$ , the other four test statistics are not ideal in this metric when $\mu_{\mathrm{true}}$ is less than a few tens, with $\chi^{2}_{\mathrm{Neyman}}$ being the worst. Ref. [6] provides a good discussion on this behavior.

In practice, $\mu_{\mathrm{true}}$ is unknown and experiments often report $\chi^{2}_{\textrm{min}}$ (evaluated at $\hat{\mu}$ ) as a metric for the goodness-of-fit test. The left panel of Fig. 4 shows the results of this test for the same setting of $n=10$ as in Fig. 3. Note that when $\chi^{2}$ is evaluated at $\hat{\mu}$ , the number of degrees of freedom is decreased by one (ndf = 9). We see that all five test statistics yield poor results in this goodness-of-fit metric when $\mu_{\mathrm{true}}$ is less than $\sim$ 10, indicating large deviations from the chi-square distribution in those cases. On the other hand, inspired by Fig. 3, we can use $\chi^{2}_{\mathrm{Pearson}}$ to perform the goodness-of-fit test, but evaluate it at a $\hat{\mu}$ obtained from a different test statistic. The right panel of Fig. 4 shows the results. We see that when $\chi^{2}_{\mathrm{Pearson}}$ is evaluated at a less-biased estimator $\hat{\mu}$ , (e.g. $\hat{\mu}_{\textrm{Poisson}}$ , $\hat{\mu}_{\textrm{Gauss}}$ , or $\hat{\mu}_{\textrm{CNP}}$ ), it results in a better metric for the goodness-of-fit test, which confirms our recommendation in Sec. 2.1.

To compare the performance on the interval estimation, Fig. 5 shows the $\Delta\chi^{2}$ distribution in the $n=10$ and $\mu_{\textrm{true}}=15$ setting with 10 million toy experiments, where $\Delta\chi^{2}=\chi^{2}(\mu=\mu_{\textrm{true}})-\chi^{2}(\mu=\hat{\mu})$ . As discussed in Sec. 2.2, when the conditions of the Wilks’ theorem [17] are met, it is expected that $\Delta\chi^{2}$ in this example follows the chi-square distribution with one degree of freedom. However, except for $\chi^{2}_{\mathrm{Poisson}}$ , the other four test statistics all clearly deviate from the ideal $\chi^{2}(1)$ distribution leading to improper coverages when using the $\Delta\chi^{2}=1$ rule to set the 68% confidence intervals. Therefore, we follow the Feldman-Cousins approach [18] to construct the 68% confidence interval instead. First, a scan of $\mu$ values is performed. Setting each test $\mu$ as the true value, many toy experiments are generated to obtain its $\Delta\chi^{2}$ distribution. Then, from each $\Delta\chi^{2}$ distribution, a critical $\Delta\chi^{2}_{c}(68\%)$ value can be determined such that below it the distribution contains 68% of the toy experiments. For example, given the distributions shown in Fig. 5, the critical $\Delta\chi^{2}_{c}(68\%)$ values for $\chi^{2}_{\mathrm{Neyman}}$ , $\chi^{2}_{\mathrm{Pearson}}$ , $\chi^{2}_{\mathrm{Gauss}}$ , and $\chi^{2}_{\mathrm{CNP}}$ are larger than one, which is the result of their biases in $\hat{\mu}$ . Finally, returning to the original toy experiments with the $\mu_{\textrm{true}}=15$ setting, for each toy experiment we can set its confidence interval by comparing its $\Delta\chi^{2}$ value with the critical $\Delta\chi^{2}_{c}$ value at each test $\mu$ value. The 68% confidence interval is constructed to contain all the test $\mu$ values that have $\Delta\chi^{2}<\Delta\chi^{2}_{c}(68\%)$ . For each of the 10 million toy experiments, this procedure is repeated to obtain its 68% confidence interval. The reported lower limit $\hat{\mu}^{-\sigma}_{1/2}$ and upper limit $\hat{\mu}^{+\sigma}_{1/2}$ of the 68% confidence interval are the median values over all toy experiments and tabulated in Table. 1. As shown, $\chi^{2}_{\mathrm{CNP}}$ and $\chi^{2}_{\mathrm{Gauss}}$ have similar (average) interval sizes, both larger than that of $\chi^{2}_{\mathrm{Poisson}}$ but quite smaller than those of $\chi^{2}_{\mathrm{Pearson}}$ and $\chi^{2}_{\mathrm{Neyman}}$ . There are two reasons causing the larger interval size of $\chi^{2}_{\mathrm{Pearson}}$ and $\chi^{2}_{\mathrm{Neyman}}$ . First, $\hat{\mu}_{\mathrm{Neyman}}$ has a notably larger variance as shown in Fig. 1. Second, since $\mu_{\textrm{true}}$ is always contained in the ensemble median of confidence intervals (but not necessarily near the center) by construction444In a frequentist definition of the 68% confidence interval (C.I.), if one performs a large number of similar experiments, the interval would contain $\mu_{\mathrm{true}}$ in 68% of the cases. This means the lower limit of the 68% C.I. would be lower than $\mu_{\mathrm{true}}$ in at least 68% of the experiments, therefore the median of the lower limit of the 68% C.I., $\hat{\mu}^{-\sigma}_{1/2}$ , is always lower than $\mu_{\mathrm{true}}$ . Similarly, the median of the upper limit of the 68% C.I., $\hat{\mu}^{+\sigma}_{1/2}$ , is always higher than $\mu_{\mathrm{true}}$ ., the larger biases of $\hat{\mu}_{\mathrm{Pearson}}$ and $\hat{\mu}_{\mathrm{Neyman}}$ also contribute to their larger interval sizes.

Next, we show two more examples with increasing complexity inspired by real experiments. Since $\chi^{2}_{\textrm{Gauss}}$ generally have a similar performance as $\chi^{2}_{\textrm{CNP}}$ and can also benefit from the covariance matrix formalism, we restrict our comparisons of $\chi^{2}_{\textrm{CNP}}$ to $\chi^{2}_{\mathrm{Poisson}}$ , $\chi^{2}_{\mathrm{Neyman}}$ , and $\chi^{2}_{\mathrm{Pearson}}$ . The following study will focus on the bias of the point estimation of model parameters, since the performance on the goodness-of-fit test and the interval estimation is similar to the first example.

3.2 Example 2: fitting multi-detector histograms

In this section, we introduce a more realistic example, which is inspired by the PROSPECT reactor neutrino experiment [19] searching for a light sterile neutrino [20]. One of the unique features of PROSPECT is that the detector consists of many segmented sub-detectors, and the number of events in each sub-detector is not high ( $\sim$ few hundreds). Since each sub-detector has a different baseline to the reactor, it is desirable to treat each sub-detector separately in the spectrum fitter to increase the physics sensitivity to the energy- and baseline-dependent oscillation effect caused by a hypothetical light sterile neutrino.

In our toy example experiment, we assume 100 sub-detectors, each measures a common energy spectrum with 16 energy bins. The expected spectrum is assumed to be flat with an unknown normalization bias factor $\epsilon$ to be determined555E shows an example where the shape of the histogram is also a model parameter.. In the $i$ th bin of the $d$ th detector, $\mu_{d}^{i}$ signal events and $b_{d}^{i}$ background events are expected, and $M_{d}^{i}$ total events are measured. The background shape is also assumed to be flat and the expected background $b_{d}^{i}$ is assumed to be half of the expected signal $\mu_{d}^{i}$ in size. The experiment also measured $B_{d}^{i}$ background events in a signal-off period, which provided an external constraint on the background. For simplicity we assume the length of the signal-off period is the same as the signal-on period. We consider one systematic uncertainty, the relative normalization uncertainty $\epsilon_{d}$ among detectors, and assume it to be constrained to 2%. Therefore, in this example, there is one model parameter $\epsilon$ , and 1700 nuisance parameters ( $b_{d}^{i}$ , $\epsilon_{d}$ ) to be estimated.

The Poisson-likelihood chi-square function for this toy experiment can be written as:

[TABLE]

and for the CNP chi-square:

[TABLE]

The $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ can be constructed similarly by changing the denominators of the first and the second terms in Eq. (21).

Minimizing the above chi-square functions involves finding the best-fit values of the 1700 nuisance parameters, which could cause instabilities of the fitter. To reduce the number of nuisance parameters, we can find their best-fit values by solving the corresponding differential equations, e.g. $\partial\chi^{2}/\partial b_{d}^{i}=0$ . In this simple example, since the nuisance parameters are independent of each other, this equation is linear for $\chi^{2}_{\mathrm{Neyman}}$ , quadratic for $\chi^{2}_{\mathrm{Poisson}}$ , quartic for $\chi^{2}_{\mathrm{Pearson}}$ , and quintic for $\chi^{2}_{\mathrm{CNP}}$ . The solutions to these equations can be found either analytically ( $\leq 4^{\mathrm{th}}$ order) or numerically ( $>4^{\mathrm{th}}$ order).

One hundred thousand toy experiments are simulated assuming the nominal signal $\mu_{d}^{i}=30$ and background $b_{d}^{i}=15$ in each bin. The normalization bias factor $\epsilon$ is fitted for each experiment, where the true value of $\epsilon$ is set to zero. The results of using $\chi^{2}_{\mathrm{Poisson}}$ , $\chi^{2}_{\mathrm{Neyman}}$ , $\chi^{2}_{\mathrm{Pearson}}$ and $\chi^{2}_{\mathrm{CNP}}$ are shown in Fig. 6. Despite being small, the bias of $\chi^{2}_{\mathrm{Poisson}}$ is non-zero. This is caused by the introduction of penalty terms in Eq. (20) (see C for an explanation). One can see that the bias of $\chi^{2}_{\mathrm{CNP}}$ is again much smaller than those of $\chi^{2}_{\mathrm{Neyman}}$ and $\chi^{2}_{\mathrm{Pearson}}$ , representing a much better approximation to $\chi^{2}_{\mathrm{Poisson}}$ .

3.3 Example 3: covariance matrix implementation

In many physics experiments, covariance matrix is used to model complicated systematic uncertainties, where either direct nuisance parameter implementation is difficult, or there are too many nuisance parameters to minimize. In this section, we show how the $\chi^{2}_{\mathrm{CNP}}$ can be implemented in a covariance matrix format.

We introduce a slight complication to the previous example so that the analytic or numerical methods to find best-fit values are prohibitively difficult in the minimization. In this example, we assume the detector response changed between the signal-on and the signal-off period, and in order to interpolate the expected background in the signal-off period $b_{d}^{i}$ to the signal-on period, a transfer matrix $R$ is needed such that $(b_{d}^{i})_{\mathrm{on}}=\sum_{j}R_{d}^{ij}b_{d}^{j}$ . For simplicity, 10 sub-detectors are used in this example, and the transfer matrix $R$ does a simple smearing in energy bins such that for each detector $R^{ij}_{d}=0.5$ when $i=j$ , $R^{ij}_{d}=0.25$ when $i=j\pm 1$ , and $R^{ij}_{d}=0$ everywhere else. The $\chi^{2}_{\mathrm{CNP}}$ in this example becomes:

[TABLE]

In this case, solving for the nuisance parameters through $\partial\chi^{2}/\partial b_{d}^{i}=0$ would lead to a set of quintic equations, which is difficult to solve either analytically or numerically. Following Sec. 2, the covariance matrix format of Eq. (22) is:

[TABLE]

where $M_{d}^{i}$ , $\mu_{d}^{i}$ , $b_{d}^{i}$ and $B_{d}^{i}$ are ordered into a single 160-element vector $\bm{M}$ , $\bm{\mu}$ , $\bm{b}$ , $\bm{B}$ , respectively. $V^{\mathrm{stat}}_{\mathrm{CNP}}$ is the covariance matrix corresponding to the statistical uncertainty, which is diagonal with its elements being the corresponding values in the denominator of the first term of Eq. (22). Similarly , $V^{\mathrm{bkg}}_{\mathrm{CNP}}$ is the covariance matrix corresponding to the background statistical uncertainty with the diagonal elements defined by the denominator of the second term in Eq (22). $V^{\mathrm{syst}}$ is the covariance matrix corresponding to the systematic uncertainty $\epsilon_{d}$ , which can be calculated either analytically or from toy Monte Carlo simulations by randomly fluctuating the number of events according to $\epsilon_{d}$ and its constraint.

Following the same procedure, covariance matrix formats can be constructed for $\chi^{2}_{\mathrm{Pearson}}$ and $\chi^{2}_{\mathrm{Neyman}}$ by replacing the statistical uncertainty terms in the covariance matrix in Eq. (23), $V^{\mathrm{stat}}_{\mathrm{CNP}}$ and $V^{\mathrm{bkg}}_{\mathrm{CNP}}$ , to their corresponding values in $\chi^{2}_{\mathrm{Pearson}}$ and $\chi^{2}_{\mathrm{Neyman}}$ . We note that there is no equivalent covariance matrix format for the Poisson-likelihood chi-square. One hundred thousand toy experiments are simulated assuming the nominal signal $\mu_{d}^{i}=30$ and background $b_{d}^{i}=15$ in each bin. The normalization bias factor $\epsilon$ is fitted for each experiment, where the true value of $\epsilon$ was set to zero. The results are shown in the left panel of Fig. 7. We see that in the covariance format, the bias of $(\chi^{2}_{\mathrm{CNP}})_{\mathrm{cov}}$ is again more than an order of magnitude smaller than those of $(\chi^{2}_{\mathrm{Neyman}})_{\mathrm{cov}}$ and $(\chi^{2}_{\mathrm{Pearson}})_{\mathrm{cov}}$ .

We emphasize that in the $(\chi^{2}_{\mathrm{CNP}})_{\mathrm{cov}}$ defined in Eq. (23), both the free parameter $\epsilon$ and the nuisance parameters $b^{d}_{i}$ need to be minimized. This is due to the nature of the Poisson statistical uncertainty of the background, and how it is treated in the CNP chi-square. It is tempting to further reduce the number of nuisance parameters by absorbing them into a fixed covariance matrix. In order to do so, we need to approximate the expected $b^{d}_{i}$ with their measured value $B^{d}_{i}$ . In this case, Eq. (22) and (23) are replaced by

[TABLE]

and

[TABLE]

where the nuisance parameters $b_{d}^{i}$ are absorbed into $V^{{}^{\prime}\mathrm{bkg}}$ . After these approximations, in $\left(\chi^{{}^{\prime}2}_{\mathrm{CNP}}\right)_{\mathrm{cov}}$ , only one free parameter $\epsilon$ instead of 161 fitting parameters in Eq. (23) needs to be minimized and the computational cost is largely reduced. Similar approximations can be used for $\left(\chi^{{}^{\prime}2}_{\mathrm{Pearson}}\right)_{\mathrm{cov}}$ and the fitting results are shown in right panel of Fig. 7. We see that although the approximation leads to a much reduced number of fitting parameters, the bias of the normalization factor $\epsilon$ becomes significantly larger, in particular for the CNP-chi-square. It is therefore crucial to indicate clearly how the $\chi^{2}$ is defined, and what approximations are implied in the construction of the covariance matrix when reporting results.

4 Discussions

Through examples in the previous section, we have compared various chi-square construction methods and different minimization strategies. In the following, we provide some recommendations on when to use them in the data analysis of counting experiments:

When the computational cost is not a concern (e.g. number of nuisance parameters is small), a direct minimization of the Poisson-likelihood chi-square (with nuisance parameters implementing through pull terms) should be used.

2.

When the computational cost of a direct minimization is high, one should first look for analytic or numerical solutions, which can effectively reduce the number of nuisance parameters without making any approximations. For example, the number of nuisance parameters of the Poisson-likelihood chi-square in the example described in Sec. 3.2 can be reduced by solving a set of independent quadratic equations.

3.

When analytic or numerical solutions are not available, approximations may become necessary to reduce the computational cost. In this case, the covariance matrix formalism is a common tool in reducing the number of nuisance parameters. However, before approximating the Poisson-likelihood chi-square by Neyman’s, Pearson’s, Gauss-likelihood, or CNP chi-squares, one can examine if it is sufficient to apply covariance matrix only to the pull terms of the systematic uncertainties. For example, the rate plus shape oscillation fit described in Ref. [21] used a covariance matrix in the pull term for reactor-related uncertainties. In this approach, the statistical part of the chi-square function can still use the Poisson-likelihood format.

4.

When the Poisson-likelihood chi-square has to be replaced, the iterative approach with the weighted least-squares as described in Ref. [10, 11, 12] can be an option to eliminate the bias in the estimator. An alternative approach is the CNP or the Gauss-likelihood chi-square, which both lead to a much reduced bias in estimating model parameters than using either Neyman’s or Pearson’s chi-square. As shown in Fig. 2 of Sec. 3.1, the CNP or the Gauss-likelihood chi-square could be the better choice of test statistics depending on the number of measurements. In addition, the improved confidence intervals (smaller in size or with more proper coverage) are often accompanied with the reduced bias as discussed in Sec. 2.2 and shown in Sec. 3.1. Similarly, analytic or numerical solutions should be explored before applying a covariance matrix approach, since additional approximations are necessary in the later case. As shown in Sec. 2, the derivation of covariance matrix formula assumes i) the variance describing statistical fluctuations has to be independent of any nuisance parameters, and ii) the predicted counts only have a linear dependence on the nuisance parameters. For example, the approximation made in the right panel of Fig. 7 leads to a significant bias.

We emphasize that since there are many different ways to make approximations in defining the chi-square test statistics, it is extremely important for experiments to clearly report how their test statistics are constructed.

In summary, we proposed a linear combination of Neyman’s and Pearson’s chi-squares, $\chi^{2}_{\mathrm{CNP}}$ , as an improved approximation to the widely-used Poisson-likelihood chi-square in counting experiments. With three examples, we show that the bias in parameter estimation from using CNP chi-square is much smaller than those using the Neyman’s or Pearson’s chi-square alone. In occasions where the computational cost of using Poisson-likelihood chi-square is high, the CNP chi-square with its covariance matrix format provides a good alternative.

Acknowledgments

We thank Maxim Gonchar and Mike Shaevitz for suggesting the comparison of the CNP chi-square with the Gauss-likelihood chi-square. This work is supported by the U.S. Department of Energy, Office of Science, Office of High Energy Physics, and Early Career Research Program under contract number DE-SC0012704.

Appendix A Treatment of bins with zero observed events

Experiments can often have bins with zero counts when the expected signal is small. In this case, the Neyman’s chi-square definition, Eq. (8), breaks down since the measured number of events is in the denominator, so are the CNP and Gauss-likelihood chi-square definitions. Practical approximations are often made in experiments by either ignoring bins with zero observation, or assign the statistical uncertainty as 1 for zero-count bins (e.g. the “modified Neyman’s chi-square” [6]). Here we adopt the Poisson-likelihood chi-square definition for zero-count bins:

[TABLE]

Eq. (26) can be re-written in a weighted least-squares format:

[TABLE]

Compared with the Pearson’s chi-square, we see that the variance is half of $\chi^{2}_{\mathrm{Pearson}}$ for zero-count bins. The covariance matrix element corresponding to a zero-count bin follows:

[TABLE]

In this paper, we use Eq. (26) and (28) in all occasions when zero-count bins are encountered.

Appendix B Bias of estimator $\hat{\mu}_{\mathrm{Neyman}}$ and $\hat{\mu}_{\mathrm{Pearson}}$ versus number of measurements

Here we prove that the bias of $\hat{\mu}_{\mathrm{Neyman}}$ and $\hat{\mu}_{\mathrm{Pearson}}$ increases as the number of measurements $n$ increases, as shown in Fig. 2. Making use of the relations

[TABLE]

for $\hat{\mu}_{\mathrm{Neyman}}$ we have:

[TABLE]

where $E(M_{i})=\mathrm{Var}(M_{i})=\mu$ since $M_{i}$ follows a Poisson distribution. The expected bias then becomes:

[TABLE]

which deviates further from zero when $n$ increases. The bias approaches -1 when $n$ and $\mu$ become large. 666Note that for the dependence on $\mu$ , Eq. (31) is only asymptotically correct when $n$ and $\mu$ are large due to the approximation made in Eq. (29). The actual dependence on $\mu$ when $n\to\infty$ can only be written as an infinite summation (e.g. $E(\hat{\mu}_{\mathrm{Neyman}})=(e^{\mu}-1)/\left(1+\sum_{k=1}^{\infty}\frac{\mu^{k}}{k(k!)}\right)$ ). One derivation can be found in Ref [13].

Similarly, for $\hat{\mu}_{\mathrm{Pearson}}$ we have:

[TABLE]

therefore:

[TABLE]

which also becomes larger at larger $n$ , since the variance of $\hat{\mu}_{\mathrm{Pearson}}$ becomes smaller at larger $n$ . The bias approaches 1/2 when $n$ and $\mu$ become large.

Appendix C Bias of $\chi^{2}_{\mathrm{Poisson}}$ when pull terms are included

In this appendix, we provide an explanation of the non-zero bias of $\epsilon$ from $\chi^{2}_{\mathrm{Poisson}}$ when pull terms are included, for example, in Eq. (20). Let us consider a simplified example. One experiment measured $m$ number of events, which follows Poisson-distribution with the mean value of $\mu$ . There is one systematic uncertainty ( $\epsilon$ ) on the normalization of $\mu$ , which is constrained with standard deviation of $\sigma$ . Following maximum-likelihood principle, the Poisson-likelihood chi-square with the constraint on $\epsilon$ is:

[TABLE]

The estimator of $\epsilon$ ( $\hat{\epsilon}$ ) can be derived through the minimization of chi-square: $\partial\chi^{2}_{\mathrm{Poisson}}/\partial\epsilon=0$ :

[TABLE]

Defining $x=\frac{4\sigma^{2}}{(1+\mu\sigma^{2})^{2}}(\mu-m)$ and assuming $|x|\ll 1$ , we can perform a Taylor expansion on Eq. (35) and obtain:

[TABLE]

Ignoring higher-order terms, the expectation of $\hat{\epsilon}$ is

[TABLE]

Given that $E(x)$ is zero and $E(x^{2})$ is non-zero, we see that in this example $\hat{\epsilon}$ is a biased estimator. $\hat{\epsilon}$ only asymptotically becomes unbiased under large statistics [5].

Appendix D Bias and covariance matrix formulas for the Gauss-likelihood chi-square

In this appendix, we provide formulas on the bias of $\hat{\mu}_{\mathrm{Gauss}}$ from the Gauss-likelihood chi-square $\chi^{2}_{\mathrm{Gauss}}$ , as well as the covariance matrix format of $\chi^{2}_{\mathrm{Gauss}}$ . Given the simple model described in Sec. 2, $\hat{\mu}_{\mathrm{Gauss}}$ can be obtained through the minimization of Eq. (6): $\partial\chi^{2}_{\mathrm{Gauss}}/\partial\mu=0$ , yielding

[TABLE]

Using the covariance matrix formalism, the likelihood function in Eq. (4) becomes:

[TABLE]

where $d$ and $\lvert V\rvert$ are the dimension and determinant of the covariance matrix $V$ , respectively. Therefore, we have

[TABLE]

with $C$ being a model-independent constant, which does not play a role in estimating the model parameters.

Appendix E Improvement on model parameters other than normalization

Although in our examples in Sec. 3, only one normalization parameter is considered (i.e. the shape of the histogram is fixed), since the CNP chi-square is a better approximation to the Poisson-likelihood chi-square for counting statistics, we expect the improvement is general for any binned histograms with models including one or more parameters. Below we show an example where the shape of the histogram is linear, with the slope ( $p_{1}$ ) and the y-intercept ( $p_{0}$ ) being two free model parameters in the fit. The example is defined as follows:

[TABLE]

where $n_{i}$ is the number of counts in the i-th bin, and $x_{i}$ is the value of the bin center. $n_{i}$ is assumed to follow a Poisson distribution. 10 bins are considered in this example and $x_{i}$ ranges from 0.1 to 1 with a step of 0.1. The true values of $p_{0}$ and $p_{1}$ are assumed to be 8 and 20, respectively. 10 million toy experiments are generated according to this setting. The distribution of best-fit values of $p_{0}$ and $p_{1}$ are shown in Fig. 8. While the relative bias in $p_{1}$ (shape) is generally smaller than that of $p_{0}$ (normalization) given a chosen test statistic, the CNP chi-square yields smaller biases in both parameters as expected.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Neyman and E. S. Pearson, On the use and interpretation of certain test criteria for purposes of statistical inference: Part i , Biometrika 20A (1928) 175–240.
2[2] J. Neyman and E. S. Pearson, On the problem of the most efficient tests of statistical hypotheses , Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231 (1933) 289–337.
3[3] W. Cash, Parameter estimation in astronomy through application of the likelihood ratio , Astrophys. J. 228 (1979) 939–947 . · doi ↗
4[4] S. Baker and R. D. Cousins, Clarification of the Use of Chi Square and Likelihood Functions in Fits to Histograms , Nucl. Instrum. Meth. 221 (1984) 437–442 . · doi ↗
5[5] Particle Data Group collaboration, M. Tanabashi et al., Review of particle physics: Chapter 39. statistics , Phys. Rev. D 98 (Aug, 2018) 030001 . · doi ↗
6[6] T. Hauschild and M. Jentschel, Comparison of maximum likelihood estimation and chi-square statistics applied to counting experiments , Nucl. Instrum. Meth. A 457 (2001) 384–401.
7[7] X. Qian, A. Tan, J. J. Ling, Y. Nakajima and C. Zhang, The Gaussian CL s method for searches of new physics , Nucl. Instrum. Meth. A 827 (2016) 63–78 , [ ar Xiv:1407.5052 ]. · doi ↗
8[8] F. James, Statistical methods in experimental physics , Hackensack, USA: World Scientific (2006) 345 p (2006) .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Combined Neyman–Pearson Chi-square: An Improved Approximation to the Poisson-likelihood Chi-square

Abstract

keywords:

1 Introduction

2 Combined Neyman–Pearson Chi-square (χCNP2\chi^{2}_{\mathrm{CNP}}χCNP2​)

2.1 Goodness of fit

2.2 Interval estimation

3 Performance of χCNP2\chi^{2}_{\mathrm{CNP}}χCNP2​

3.1 Example 1: simple counting

3.2 Example 2: fitting multi-detector histograms

3.3 Example 3: covariance matrix implementation

4 Discussions

Acknowledgments

Appendix A Treatment of bins with zero observed events

Appendix B Bias of estimator μ^Neyman\hat{\mu}_{\mathrm{Neyman}}μ^​Neyman​ and μ^Pearson\hat{\mu}_{\mathrm{Pearson}}μ^​Pearson​ versus number of measurements

Appendix C Bias of χPoisson2\chi^{2}_{\mathrm{Poisson}}χPoisson2​ when pull terms are included

Appendix D Bias and covariance matrix formulas for the Gauss-likelihood chi-square

Appendix E Improvement on model parameters other than normalization

2 Combined Neyman–Pearson Chi-square ( $\chi^{2}_{\mathrm{CNP}}$ )

3 Performance of $\chi^{2}_{\mathrm{CNP}}$

Appendix B Bias of estimator $\hat{\mu}_{\mathrm{Neyman}}$ and $\hat{\mu}_{\mathrm{Pearson}}$ versus number of measurements

Appendix C Bias of $\chi^{2}_{\mathrm{Poisson}}$ when pull terms are included