TL;DR
This paper introduces a new approximation called combined Neyman-Pearson chi-square that reduces bias in parameter estimation compared to traditional chi-square methods, offering a computationally efficient alternative to the Poisson-likelihood chi-square.
Contribution
The paper proposes the combined Neyman-Pearson chi-square as an improved approximation to the Poisson-likelihood chi-square, with analytical and simulation validation.
Findings
$ ext{CNP}$ chi-square reduces bias in parameter estimates.
$ ext{CNP}$ provides a computationally efficient alternative.
Significant bias reduction compared to Neyman's or Pearson's chi-square.
Abstract
We describe an approximation to the widely-used Poisson-likelihood chi-square using a linear combination of Neyman's and Pearson's chi-squares, namely "combined Neyman-Pearson chi-square" (). Through analytical derivations and toy model simulations, we show that leads to a significantly smaller bias on the best-fit model parameters compared to those using either Neyman's or Pearson's chi-square. When the computational cost of using the Poisson-likelihood chi-square is high, provides a good alternative given its natural connection to the covariance matrix formalism.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14| median 68% confidence interval | interval size | |
|---|---|---|
| (13.839, 16.226) | 2.387 | |
| (13.744, 16.221) | 2.478 | |
| (12.236, 15.706) | 3.471 | |
| (14.153, 16.800) | 2.647 | |
| (13.745, 16.196) | 2.451 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Combined Neyman–Pearson Chi-square: An Improved Approximation to the Poisson-likelihood Chi-square
Xiangpan Ji
Wenqiang Gu
Xin Qian
Hanyu Wei
Chao Zhang
Physics Department, Brookhaven National Laboratory, Upton, NY, USA
Abstract
We describe an approximation to the widely-used Poisson-likelihood chi-square using a linear combination of Neyman’s and Pearson’s chi-squares, namely “combined Neyman–Pearson chi-square” (). Through analytical derivations and toy model simulations, we show that leads to a significantly smaller bias on the best-fit model parameters compared to those using either Neyman’s or Pearson’s chi-square. When the computational cost of using the Poisson-likelihood chi-square is high, provides a good alternative given its natural connection to the covariance matrix formalism.
keywords:
test statistics, Poisson-likelihood chi-square, Neyman’s chi-square, Pearson’s chi-square
††journal: Nuclear Instruments and Methods A
1 Introduction
In high-energy physics experiments, it is often convenient to bin the data into a histogram with bins. The number of measured events in each bin typically follows a Poisson distribution with the mean value predicted by a set of model parameters . The likelihood function of this Poisson histogram can be written as:
[TABLE]
A maximum-likelihood estimator (MLE) of can be constructed by maximizing the likelihood ratio [1, 2]
[TABLE]
where the denominator is a model-independent constant that maximizes the likelihood of the data without any restriction on the model111While the estimation of model parameters does not depend on the denominator of the likelihood ratio, the chi-square test statistic constructed in this way, such as that in Eq. (3), can be used to examine the data-model compatibility with a goodness-of-fit test.. Maximizing this likelihood ratio is equivalent to minimizing the Poisson-likelihood chi-square function [3, 4]:
[TABLE]
The MLE is commonly used in the high-energy physics, as it is generally an asymptotically unbiased estimator, and has the advantage of being consistent and efficient [5].
At large statistics, the previous Poisson distribution can be approximated by a normal (or Gaussian) distribution with mean and variance . The likelihood then becomes:
[TABLE]
The Gauss-MLE can be similarly constructed through a likelihood ratio:
[TABLE]
where the denominator is the maximum of without any restriction on the model, and can be derived by calculating . Maximizing is equivalent to minimizing the Gauss-likelihood chi-square function
[TABLE]
While the Gauss-likelihood chi-square is relatively well-known (see e.g. [6, 7]) 222We further provide some relevant formulas for the Gauss-likelihood chi-square in D., interestingly, it is not widely used in high-energy physics experiments. Instead, a direct chi-square test statistic, namely the Pearson’s chi-square, is constructed through:
[TABLE]
Comparing with Eq. (6), we see consists of only the first term in . These two chi-squares become asymptotically equivalent when is large.
In practice, the variance is often approximated by the measured value , which is independent of the model parameters. This leads to another popular chi-square test statistic in high-energy physics experiments, namely the Neyman’s chi-square:
[TABLE]
Comparing to the MLE from the Poisson-likelihood chi-square, it is known that the estimator of model parameters constructed from Pearson’s or Neyman’s chi-square leads to biases especially when the large-statistics condition is not met [4, 8, 9]. Despite this shortcoming, both and are commonly used in physics data analysis, partly because of their close connection to the covariance-matrix formalism:
[TABLE]
where is the covariance matrix of the prediction, and can often be calculated through Monte Carlo methods based on the statistical and systematic uncertainties of the experiment prior to the minimization of . In situations where many nuisance parameters [5] are required in the likelihood function as in Eq. (1), the covariance matrix format Eq. (9) has a natural advantage of reducing the number of nuisance parameters, thus leads to a faster minimization of the function.
One method to remove the bias of the estimator from is through an iteration of the weighted least-squares fit, where the variance in one round of minimization is replaced by the prediction from the best-fit value in the previous round of iteration [10, 11, 12]. Several modified chi-square test statistics have also been proposed in past literatures to mitigate the bias issue. For example, defined in Eq. (6) is a good replacement of when the number of measurements is large. Similarly, as proposed by Mighell [13] is a good alternative to when the number of measurements is large. Both and , however, still lead to biases when the number of measurements is small. Redin proposed a solution by including a cubic term in and [14], or by reporting a weighted average of fitting results from and [15].
In this paper, we propose a new method through the construction of a chi-square test statistic () with a linear combination of Neyman’s and Pearson’s chi-squares. As an improved approximation to the Poisson-likelihood chi-square with respect to either Neyman’s or Pearson’s chi-square, the significantly reduces the bias while keeping the advantage of the covariance matrix formalism. This paper is organized as follows. The construction of and its covariance matrix format is described in Sec. 2. Three toy examples are presented in Sec. 3 to illustrate the features and advantages of . Finally, we summarize the recommended usage in data analysis of counting experiments in Sec. 4.
2 Combined Neyman–Pearson Chi-square ()
The bias in the estimator of model parameters using Neyman’s or Pearson’s chi-square can be traced back to the different definitions in approximating the Poisson-likelihood chi-square. To illustrate this, we start with a simple example. A set of independent counting experiments were performed to measure a common expected value . Each experiment measured events. The three chi-square functions in this case are 333The treatment for bins where is described in A.:
[TABLE]
(the estimator of ) can be calculated through the minimization of Eq. (10): . We obtain:
[TABLE]
Given Eq. (11), it is straightforward to show that , where the equal sign is only established when all values of are the same. Since is unbiased in this simple example, we see that and are biased in the opposite directions.
We further examine the difference in chi-square values. Assuming that and are reasonably large so that is close to , a Taylor expansion of yields:
[TABLE]
From Eq. (12), it is straightforward to deduce:
[TABLE]
Naturally, we can define a new chi-square function as a linear combination of Neyman’s and Pearson’s chi-squares:
[TABLE]
which is approximately equal to up to , better than either or alone. In this example, the estimator from minimizing can be derived as:
[TABLE]
which is the geometric mean of two and one . Since the bias of and are in the opposite directions, it is easy to see that has a reduced bias.
More generally, when model parameters and systematic uncertainties are included, the can be written as:
[TABLE]
where are model parameters, and are nuisance parameters representing systematic uncertainties constrained with their corresponding standard deviations (). As an improved approximation to , in Eq. (16) will naturally lead to a reduced bias in estimating model parameters , such as the normalization or the shape of the histograms, than using or .
It is worth noting that in , the variance of the Gaussian distribution for the th bin is approximated as , while for and they are and , respectively. From this we can further deduce the covariance matrix format of the . Following Ref. [16], when can be approximated as being linearly dependent on nuisance parameters: , the chi-square format with pull terms (e.g. Eq. 16) is equivalent to the chi-square in the covariance matrix format (Eq. 9). In this case, the covariance matrix can be written as
[TABLE]
Therefore, the covariance matrix format of becomes:
[TABLE]
where
[TABLE]
Note that in Eq. (19) we have approximated by fixing the nuisance parameters at their externally constrained (i.e. nominal) values. This is necessary because the above derivation requires that uncertainties must be independent of the nuisance parameters [16].
While the biases of Neyman’s and Pearson’s chi-squares are well-known [4, 8, 9], the construction of is, interestingly, new. This could be partially caused by the fact that in low-statistics experiments where the use of or leads to a high bias, the Poisson-likelihood chi-square is generally used instead. , however, provides certain advantages in situations where either the number of nuisance parameters is too high, or the likelihood function is analytically difficult to write. In the next section, we demonstrate the features and advantages of with three toy examples of increasing complexity. Before that, below we briefly discuss the expected performance of regarding two other common properties of a test statistic: the goodness of fit and the interval estimation.
2.1 Goodness of fit
In a goodness-of-fit test, the test statistic (e.g. ) is evaluated at the estimator (i.e. the best-fit value of ). Assuming its distribution following a chi-square distribution with the corresponding number of degrees of freedom, a p-value can be calculated to evaluate the compatibility between the data and the model. Although can be used to perform such a test, it does not hold a particular advantage over the preferred choice of [6]. As shown in Fig. 3 in Sec. 3.1, the distributions of , , , , and all deviate from the ideal chi-square distribution at low values of , while deviates the least. In addition, the mean of the distribution equals to the number of degrees of freedom at all ’s. Therefore, following Ref. [6], we recommend to use together with the least-biased estimator (from e.g. or ) to perform the goodness-of-fit test.
2.2 Interval estimation
It is well known that the construction of confidence intervals in the frequentist approach not only depends on the choice of test statistics , but also on its actual procedure. Within the high-energy physics community, there are two popular procedures in setting the confidence intervals, which we describe below.
The first procedure is based on the Wilks’ theorem [17]. The confidence interval is set by placing a certain threshold on the distribution of , where , , and are the parameter of interest, the test statistic evaluated at , and the global minimum of the for all model parameters, respectively. Under the conditions that i) the two hypotheses are nested, ii) the parameters of the larger hypothesis (e.g. ) are all uniquely defined in the smaller hypothesis (e.g. ), and not on the limits of the allowed region, and iii) data are asymptotic, Wilks proves that the negative-two-log-likelihood-ratio test statistic follows a chi-square distribution and the estimator follows a normal distribution centered around the true value . Consequently, the threshold can be conveniently calculated. For instance, the threshold for the 68%, 95%, and 99.7% confidence intervals are 1, 4, and 9, respectively, assuming follows a chi-square distribution with one degree of freedom. With this procedure, the correctness of the confidence interval coverage depends on the validity of the Wilks’ theorem. As demonstrated in Eq. (13) and Eq. (14), is an improved approximation to the negative-two-log-likelihood-ratio of the Poisson distribution (i.e. ), and it leads to a reduced bias in the estimator compared to those from and . Therefore, the conditions of the Wilks’ theorem are better met with , which means the the chi-square distribution is a better approximation to the distribution from . Fig. 5 in Sec. 3.1 shows one such example. Consequently, we expect a more proper coverage of the confidence interval using when compared to those using or under this procedure.
The second procedure is commonly referred to as the Feldman-Cousins approach [18] in the high-energy physics community. In this procedure, the construction of the confidence interval strictly follows a frequentist definition (Neyman construction) with an ordering principle based on the value of the likelihood-ratio test statistic (i.e. with for counting experiments) to ensure a proper frequentist coverage. Sec. 3.1 shows an example of this procedure with a toy experiment. Similarly, the procedure can be defined with an ordering principle based on other test statistics (e.g. , , or ), and the constructed confidence intervals would also have proper coverages in general. In this case, while all of the coverages are proper, a better test statistic is expected to yield a smaller confidence interval in size (or area, volume). As shown in Table. 1 of Sec. 3.1, the confidence interval constructed using is smaller than those using and . This is partially caused by the reduced bias in the estimator using , as will be further discussed in Sec.3.1.
We should note that there are other procedures to set confidence intervals that are less affected by certain properties of the test statistics. For example, since the bias () of an estimator can be evaluated with a Monte Carlo method, one can define an alternative test statistic with . Naturally, the confidence interval constructed using with either the thresholding approach based on the Wilks’ theorem or the Feldman-Cousins approach would be less affected by the bias, and performs better than that of at the cost of increased computation.
3 Performance of
In this section, we compare the performance of , , , and with three toy examples. While we focus on the issue of bias, we also provide comparison results of the goodness-of-fit test and the interval estimation in the first example to support the discussion in Sec. 2.1 and Sec. 2.2. For completeness of the discussion, we add , which has a similar performance to in certain scenarios, to the comparison in the first example.
3.1 Example 1: simple counting
The first example is similar to the one introduced in Sec. 2. In each toy experiment, a set of independent counting measurements were performed to measure a common expected value . The curves with and of one simulated toy experiment is shown in the left panel of Fig. 1. The minimum location of the curve represents the estimator . It is clear that and the CNP chi-square curve closely resembles the Poisson-likelihood chi-square as demonstrated in the previous section.
The relative biases of using , , , and are shown in the right panel of Fig. 1 with 10 million toy experiments. The bias using is zero. The biases using and have opposite signs. The magnitude of mean bias using is about twice of that using . The bias using is an order of magnitude smaller than those using and . The bias using is similar to . The variance of is notably larger than those of the other four test statistics, which are similar.
In Fig. 2, we further study the biases of with different values of and the number of measurements . The biases using are always zero as expected from an unbiased estimator in this simple example. The biases using , , and become larger as the number of measurements increases. This behavior may not be intuitive, but is well known and the proof is provided in B. As and increases, the biases of and approach and , respectively. Beside these observations, the general features of the biases stay the same as discussed previously. Most importantly, yields a much smaller bias than or in all occasions.
Figure 2 also shows the performance of , which is another way to mitigate the bias issue. Similar to , performs much better than or . We note that the bias of is less dependent on , and becomes smaller when increases. This is expected from the central limit theorem, which states that the sum of a large number of identically distributed random variables follows a normal distribution. Therefore, when the number of measurements is large, provides a better performance even when is small. On the other hand, when number of measurements is not large, shows a better performance.
Next, we compare the performance on the goodness-of-fit test. The left panel of Fig. 3 shows the distribution of the five test statistics evaluated at in the setting with 10 million toy experiments. The ideal chi-square distribution with 10 degrees of freedom is also shown for comparison. All five test statistics deviate from the ideal chi-square distribution, with being the closest and deviating the most. The mean of is exactly 10, and the mean of is the largest. The right panel of Fig. 3 shows the relative deviation of the mean to the number of degrees of freedom ( in all toy experiments) for the five test statistics as a function of . It is clear that except for , the other four test statistics are not ideal in this metric when is less than a few tens, with being the worst. Ref. [6] provides a good discussion on this behavior.
In practice, is unknown and experiments often report (evaluated at ) as a metric for the goodness-of-fit test. The left panel of Fig. 4 shows the results of this test for the same setting of as in Fig. 3. Note that when is evaluated at , the number of degrees of freedom is decreased by one (ndf = 9). We see that all five test statistics yield poor results in this goodness-of-fit metric when is less than 10, indicating large deviations from the chi-square distribution in those cases. On the other hand, inspired by Fig. 3, we can use to perform the goodness-of-fit test, but evaluate it at a obtained from a different test statistic. The right panel of Fig. 4 shows the results. We see that when is evaluated at a less-biased estimator , (e.g. , , or ), it results in a better metric for the goodness-of-fit test, which confirms our recommendation in Sec. 2.1.
To compare the performance on the interval estimation, Fig. 5 shows the distribution in the and setting with 10 million toy experiments, where . As discussed in Sec. 2.2, when the conditions of the Wilks’ theorem [17] are met, it is expected that in this example follows the chi-square distribution with one degree of freedom. However, except for , the other four test statistics all clearly deviate from the ideal distribution leading to improper coverages when using the rule to set the 68% confidence intervals. Therefore, we follow the Feldman-Cousins approach [18] to construct the 68% confidence interval instead. First, a scan of values is performed. Setting each test as the true value, many toy experiments are generated to obtain its distribution. Then, from each distribution, a critical value can be determined such that below it the distribution contains 68% of the toy experiments. For example, given the distributions shown in Fig. 5, the critical values for , , , and are larger than one, which is the result of their biases in . Finally, returning to the original toy experiments with the setting, for each toy experiment we can set its confidence interval by comparing its value with the critical value at each test value. The 68% confidence interval is constructed to contain all the test values that have . For each of the 10 million toy experiments, this procedure is repeated to obtain its 68% confidence interval. The reported lower limit and upper limit of the 68% confidence interval are the median values over all toy experiments and tabulated in Table. 1. As shown, and have similar (average) interval sizes, both larger than that of but quite smaller than those of and . There are two reasons causing the larger interval size of and . First, has a notably larger variance as shown in Fig. 1. Second, since is always contained in the ensemble median of confidence intervals (but not necessarily near the center) by construction444In a frequentist definition of the 68% confidence interval (C.I.), if one performs a large number of similar experiments, the interval would contain in 68% of the cases. This means the lower limit of the 68% C.I. would be lower than in at least 68% of the experiments, therefore the median of the lower limit of the 68% C.I., , is always lower than . Similarly, the median of the upper limit of the 68% C.I., , is always higher than ., the larger biases of and also contribute to their larger interval sizes.
Next, we show two more examples with increasing complexity inspired by real experiments. Since generally have a similar performance as and can also benefit from the covariance matrix formalism, we restrict our comparisons of to , , and . The following study will focus on the bias of the point estimation of model parameters, since the performance on the goodness-of-fit test and the interval estimation is similar to the first example.
3.2 Example 2: fitting multi-detector histograms
In this section, we introduce a more realistic example, which is inspired by the PROSPECT reactor neutrino experiment [19] searching for a light sterile neutrino [20]. One of the unique features of PROSPECT is that the detector consists of many segmented sub-detectors, and the number of events in each sub-detector is not high (few hundreds). Since each sub-detector has a different baseline to the reactor, it is desirable to treat each sub-detector separately in the spectrum fitter to increase the physics sensitivity to the energy- and baseline-dependent oscillation effect caused by a hypothetical light sterile neutrino.
In our toy example experiment, we assume 100 sub-detectors, each measures a common energy spectrum with 16 energy bins. The expected spectrum is assumed to be flat with an unknown normalization bias factor to be determined555E shows an example where the shape of the histogram is also a model parameter.. In the th bin of the th detector, signal events and background events are expected, and total events are measured. The background shape is also assumed to be flat and the expected background is assumed to be half of the expected signal in size. The experiment also measured background events in a signal-off period, which provided an external constraint on the background. For simplicity we assume the length of the signal-off period is the same as the signal-on period. We consider one systematic uncertainty, the relative normalization uncertainty among detectors, and assume it to be constrained to 2%. Therefore, in this example, there is one model parameter , and 1700 nuisance parameters (, ) to be estimated.
The Poisson-likelihood chi-square function for this toy experiment can be written as:
[TABLE]
and for the CNP chi-square:
[TABLE]
The and can be constructed similarly by changing the denominators of the first and the second terms in Eq. (21).
Minimizing the above chi-square functions involves finding the best-fit values of the 1700 nuisance parameters, which could cause instabilities of the fitter. To reduce the number of nuisance parameters, we can find their best-fit values by solving the corresponding differential equations, e.g. . In this simple example, since the nuisance parameters are independent of each other, this equation is linear for , quadratic for , quartic for , and quintic for . The solutions to these equations can be found either analytically ( order) or numerically ( order).
One hundred thousand toy experiments are simulated assuming the nominal signal and background in each bin. The normalization bias factor is fitted for each experiment, where the true value of is set to zero. The results of using , , and are shown in Fig. 6. Despite being small, the bias of is non-zero. This is caused by the introduction of penalty terms in Eq. (20) (see C for an explanation). One can see that the bias of is again much smaller than those of and , representing a much better approximation to .
3.3 Example 3: covariance matrix implementation
In many physics experiments, covariance matrix is used to model complicated systematic uncertainties, where either direct nuisance parameter implementation is difficult, or there are too many nuisance parameters to minimize. In this section, we show how the can be implemented in a covariance matrix format.
We introduce a slight complication to the previous example so that the analytic or numerical methods to find best-fit values are prohibitively difficult in the minimization. In this example, we assume the detector response changed between the signal-on and the signal-off period, and in order to interpolate the expected background in the signal-off period to the signal-on period, a transfer matrix is needed such that . For simplicity, 10 sub-detectors are used in this example, and the transfer matrix does a simple smearing in energy bins such that for each detector when , when , and everywhere else. The in this example becomes:
[TABLE]
In this case, solving for the nuisance parameters through would lead to a set of quintic equations, which is difficult to solve either analytically or numerically. Following Sec. 2, the covariance matrix format of Eq. (22) is:
[TABLE]
where , , and are ordered into a single 160-element vector , , , , respectively. is the covariance matrix corresponding to the statistical uncertainty, which is diagonal with its elements being the corresponding values in the denominator of the first term of Eq. (22). Similarly , is the covariance matrix corresponding to the background statistical uncertainty with the diagonal elements defined by the denominator of the second term in Eq (22). is the covariance matrix corresponding to the systematic uncertainty , which can be calculated either analytically or from toy Monte Carlo simulations by randomly fluctuating the number of events according to and its constraint.
Following the same procedure, covariance matrix formats can be constructed for and by replacing the statistical uncertainty terms in the covariance matrix in Eq. (23), and , to their corresponding values in and . We note that there is no equivalent covariance matrix format for the Poisson-likelihood chi-square. One hundred thousand toy experiments are simulated assuming the nominal signal and background in each bin. The normalization bias factor is fitted for each experiment, where the true value of was set to zero. The results are shown in the left panel of Fig. 7. We see that in the covariance format, the bias of is again more than an order of magnitude smaller than those of and .
We emphasize that in the defined in Eq. (23), both the free parameter and the nuisance parameters need to be minimized. This is due to the nature of the Poisson statistical uncertainty of the background, and how it is treated in the CNP chi-square. It is tempting to further reduce the number of nuisance parameters by absorbing them into a fixed covariance matrix. In order to do so, we need to approximate the expected with their measured value . In this case, Eq. (22) and (23) are replaced by
[TABLE]
and
[TABLE]
where the nuisance parameters are absorbed into . After these approximations, in , only one free parameter instead of 161 fitting parameters in Eq. (23) needs to be minimized and the computational cost is largely reduced. Similar approximations can be used for and the fitting results are shown in right panel of Fig. 7. We see that although the approximation leads to a much reduced number of fitting parameters, the bias of the normalization factor becomes significantly larger, in particular for the CNP-chi-square. It is therefore crucial to indicate clearly how the is defined, and what approximations are implied in the construction of the covariance matrix when reporting results.
4 Discussions
Through examples in the previous section, we have compared various chi-square construction methods and different minimization strategies. In the following, we provide some recommendations on when to use them in the data analysis of counting experiments:
When the computational cost is not a concern (e.g. number of nuisance parameters is small), a direct minimization of the Poisson-likelihood chi-square (with nuisance parameters implementing through pull terms) should be used.
- 2.
When the computational cost of a direct minimization is high, one should first look for analytic or numerical solutions, which can effectively reduce the number of nuisance parameters without making any approximations. For example, the number of nuisance parameters of the Poisson-likelihood chi-square in the example described in Sec. 3.2 can be reduced by solving a set of independent quadratic equations.
- 3.
When analytic or numerical solutions are not available, approximations may become necessary to reduce the computational cost. In this case, the covariance matrix formalism is a common tool in reducing the number of nuisance parameters. However, before approximating the Poisson-likelihood chi-square by Neyman’s, Pearson’s, Gauss-likelihood, or CNP chi-squares, one can examine if it is sufficient to apply covariance matrix only to the pull terms of the systematic uncertainties. For example, the rate plus shape oscillation fit described in Ref. [21] used a covariance matrix in the pull term for reactor-related uncertainties. In this approach, the statistical part of the chi-square function can still use the Poisson-likelihood format.
- 4.
When the Poisson-likelihood chi-square has to be replaced, the iterative approach with the weighted least-squares as described in Ref. [10, 11, 12] can be an option to eliminate the bias in the estimator. An alternative approach is the CNP or the Gauss-likelihood chi-square, which both lead to a much reduced bias in estimating model parameters than using either Neyman’s or Pearson’s chi-square. As shown in Fig. 2 of Sec. 3.1, the CNP or the Gauss-likelihood chi-square could be the better choice of test statistics depending on the number of measurements. In addition, the improved confidence intervals (smaller in size or with more proper coverage) are often accompanied with the reduced bias as discussed in Sec. 2.2 and shown in Sec. 3.1. Similarly, analytic or numerical solutions should be explored before applying a covariance matrix approach, since additional approximations are necessary in the later case. As shown in Sec. 2, the derivation of covariance matrix formula assumes i) the variance describing statistical fluctuations has to be independent of any nuisance parameters, and ii) the predicted counts only have a linear dependence on the nuisance parameters. For example, the approximation made in the right panel of Fig. 7 leads to a significant bias.
We emphasize that since there are many different ways to make approximations in defining the chi-square test statistics, it is extremely important for experiments to clearly report how their test statistics are constructed.
In summary, we proposed a linear combination of Neyman’s and Pearson’s chi-squares, , as an improved approximation to the widely-used Poisson-likelihood chi-square in counting experiments. With three examples, we show that the bias in parameter estimation from using CNP chi-square is much smaller than those using the Neyman’s or Pearson’s chi-square alone. In occasions where the computational cost of using Poisson-likelihood chi-square is high, the CNP chi-square with its covariance matrix format provides a good alternative.
Acknowledgments
We thank Maxim Gonchar and Mike Shaevitz for suggesting the comparison of the CNP chi-square with the Gauss-likelihood chi-square. This work is supported by the U.S. Department of Energy, Office of Science, Office of High Energy Physics, and Early Career Research Program under contract number DE-SC0012704.
Appendix A Treatment of bins with zero observed events
Experiments can often have bins with zero counts when the expected signal is small. In this case, the Neyman’s chi-square definition, Eq. (8), breaks down since the measured number of events is in the denominator, so are the CNP and Gauss-likelihood chi-square definitions. Practical approximations are often made in experiments by either ignoring bins with zero observation, or assign the statistical uncertainty as 1 for zero-count bins (e.g. the “modified Neyman’s chi-square” [6]). Here we adopt the Poisson-likelihood chi-square definition for zero-count bins:
[TABLE]
Eq. (26) can be re-written in a weighted least-squares format:
[TABLE]
Compared with the Pearson’s chi-square, we see that the variance is half of for zero-count bins. The covariance matrix element corresponding to a zero-count bin follows:
[TABLE]
In this paper, we use Eq. (26) and (28) in all occasions when zero-count bins are encountered.
Appendix B Bias of estimator and versus number of measurements
Here we prove that the bias of and increases as the number of measurements increases, as shown in Fig. 2. Making use of the relations
[TABLE]
for we have:
[TABLE]
where since follows a Poisson distribution. The expected bias then becomes:
[TABLE]
which deviates further from zero when increases. The bias approaches -1 when and become large. 666Note that for the dependence on , Eq. (31) is only asymptotically correct when and are large due to the approximation made in Eq. (29). The actual dependence on when can only be written as an infinite summation (e.g. ). One derivation can be found in Ref [13].
Similarly, for we have:
[TABLE]
therefore:
[TABLE]
which also becomes larger at larger , since the variance of becomes smaller at larger . The bias approaches 1/2 when and become large.
Appendix C Bias of when pull terms are included
In this appendix, we provide an explanation of the non-zero bias of from when pull terms are included, for example, in Eq. (20). Let us consider a simplified example. One experiment measured number of events, which follows Poisson-distribution with the mean value of . There is one systematic uncertainty () on the normalization of , which is constrained with standard deviation of . Following maximum-likelihood principle, the Poisson-likelihood chi-square with the constraint on is:
[TABLE]
The estimator of () can be derived through the minimization of chi-square: :
[TABLE]
Defining and assuming , we can perform a Taylor expansion on Eq. (35) and obtain:
[TABLE]
Ignoring higher-order terms, the expectation of is
[TABLE]
Given that is zero and is non-zero, we see that in this example is a biased estimator. only asymptotically becomes unbiased under large statistics [5].
Appendix D Bias and covariance matrix formulas for the Gauss-likelihood chi-square
In this appendix, we provide formulas on the bias of from the Gauss-likelihood chi-square , as well as the covariance matrix format of . Given the simple model described in Sec. 2, can be obtained through the minimization of Eq. (6): , yielding
[TABLE]
Using the covariance matrix formalism, the likelihood function in Eq. (4) becomes:
[TABLE]
where and are the dimension and determinant of the covariance matrix , respectively. Therefore, we have
[TABLE]
with being a model-independent constant, which does not play a role in estimating the model parameters.
Appendix E Improvement on model parameters other than normalization
Although in our examples in Sec. 3, only one normalization parameter is considered (i.e. the shape of the histogram is fixed), since the CNP chi-square is a better approximation to the Poisson-likelihood chi-square for counting statistics, we expect the improvement is general for any binned histograms with models including one or more parameters. Below we show an example where the shape of the histogram is linear, with the slope () and the y-intercept () being two free model parameters in the fit. The example is defined as follows:
[TABLE]
where is the number of counts in the i-th bin, and is the value of the bin center. is assumed to follow a Poisson distribution. 10 bins are considered in this example and ranges from 0.1 to 1 with a step of 0.1. The true values of and are assumed to be 8 and 20, respectively. 10 million toy experiments are generated according to this setting. The distribution of best-fit values of and are shown in Fig. 8. While the relative bias in (shape) is generally smaller than that of (normalization) given a chosen test statistic, the CNP chi-square yields smaller biases in both parameters as expected.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Neyman and E. S. Pearson, On the use and interpretation of certain test criteria for purposes of statistical inference: Part i , Biometrika 20A (1928) 175–240.
- 2[2] J. Neyman and E. S. Pearson, On the problem of the most efficient tests of statistical hypotheses , Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231 (1933) 289–337.
- 3[3] W. Cash, Parameter estimation in astronomy through application of the likelihood ratio , Astrophys. J. 228 (1979) 939–947 . · doi ↗
- 4[4] S. Baker and R. D. Cousins, Clarification of the Use of Chi Square and Likelihood Functions in Fits to Histograms , Nucl. Instrum. Meth. 221 (1984) 437–442 . · doi ↗
- 5[5] Particle Data Group collaboration, M. Tanabashi et al., Review of particle physics: Chapter 39. statistics , Phys. Rev. D 98 (Aug, 2018) 030001 . · doi ↗
- 6[6] T. Hauschild and M. Jentschel, Comparison of maximum likelihood estimation and chi-square statistics applied to counting experiments , Nucl. Instrum. Meth. A 457 (2001) 384–401.
- 7[7] X. Qian, A. Tan, J. J. Ling, Y. Nakajima and C. Zhang, The Gaussian CL s method for searches of new physics , Nucl. Instrum. Meth. A 827 (2016) 63–78 , [ ar Xiv:1407.5052 ]. · doi ↗
- 8[8] F. James, Statistical methods in experimental physics , Hackensack, USA: World Scientific (2006) 345 p (2006) .
