A binned likelihood for stochastic models

Carlos A. Arg\"uelles; Austin Schneider; Tianlu Yuan

arXiv:1901.04645·physics.data-an·June 26, 2019

A binned likelihood for stochastic models

Carlos A. Arg\"uelles, Austin Schneider, Tianlu Yuan

PDF

Open Access 3 Repos

TL;DR

This paper introduces a new analytic likelihood method that incorporates Monte Carlo uncertainties, improving model assessment accuracy in complex systems with limited or large datasets.

Contribution

It presents a novel likelihood formulation that accounts for Monte Carlo uncertainties, enhancing statistical inference in complex stochastic models.

Findings

01

Performs better than semi-analytic methods

02

Prevents biased statistical claims

03

Provides improved coverage properties

Abstract

Metrics of model goodness-of-fit, model comparison, and model parameter estimation are the main categories of statistical problems in science. Bayesian and frequentist methods that address these questions often rely on a likelihood function, which is the key ingredient in order to assess the plausibility of model parameters given observed data. In some complex systems or experimental setups, predicting the outcome of a model cannot be done analytically, and Monte Carlo techniques are used. In this paper, we present a new analytic likelihood that takes into account Monte Carlo uncertainties, appropriate for use in the large and small sample size limits. Our formulation performs better than semi-analytic methods, prevents strong claims on biased statements, and provides improved coverage properties compared to available methods.

Figures8

Click any figure to enlarge with its caption.

Tables2

Table 1. Table 1: Best-fit parameters For the toy experiment shown in Fig. 4 , best-fit parameters using ℒ AdHoc subscript ℒ AdHoc \mathcal{L}_{\textmd{AdHoc}} and ℒ Eff subscript ℒ Eff \mathcal{L}_{\textmd{Eff}} are shown. The columns in the table are for the different MC sizes. The two numbers in parenthesis in each entry correspond to Ω Ω \Omega and Φ Φ \Phi , respectively.

Likelihood	$N_{MC} = 10^{4}$	$10^{5}$	$10^{6}$
$ℒ_{AdHoc}$	$(127.0, 6368.0)$	$(124.7, 5655.7)$	$(125.1, 4888.5)$
$ℒ_{Eff}$	$(127.1, 6077.1)$	$(124.7, 5576.0)$	$(125.1, 4889.4)$

Table 2. SUPPL. TABLE 1: Table of likelihood formulas. The likelihood functions discussed in this paper are given in each row. They are written in terms of μ 𝜇 \mu and σ 𝜎 \sigma , whose explicit formulas are given in the top row, and the number of observed events, k 𝑘 k , in the bin. In the case of ℒ BB subscript ℒ BB \mathcal{L}_{\textmd{BB}} we write the likelihood for the single-process case. Our main result and recommended likelihood, ℒ Eff subscript ℒ Eff \mathcal{L}_{\textmd{Eff}} , is given in the last row.

Parameters	$μ \equiv \sum_{i = 1}^{m} w_{i}, σ^{2} \equiv \sum_{i = 1}^{m} w_{i}^{2}$
$ℒ_{AdHoc}$	$\frac{μ^{k} e^{- μ}}{k!}$
$χ_{\mod}^{2}$	$\frac{{(k - μ)}^{2}}{μ + σ^{2}}$
$ℒ_{BB}^{s = 1}$	$\max_{\bar{m}} {\frac{1}{k! m!} {(\frac{μ \bar{m}}{m})}^{k} {\bar{m}}^{m} e^{- \frac{μ \bar{m}}{m} - \bar{m}}}$
$ℒ_{Mean}$	${(\frac{μ}{σ^{2}})}^{\frac{μ^{2}}{σ^{2}}} Γ (k + \frac{μ^{2}}{σ^{2}}) {[k! {(1 + \frac{μ}{σ^{2}})}^{k + \frac{μ^{2}}{σ^{2}}} Γ (\frac{μ^{2}}{σ^{2}})]}^{- 1}$
$ℒ_{Eff}$	${(\frac{μ}{σ^{2}})}^{\frac{μ^{2}}{σ^{2}} + 1} Γ (k + \frac{μ^{2}}{σ^{2}} + 1) {[k! {(1 + \frac{μ}{σ^{2}})}^{k + \frac{μ^{2}}{σ^{2}} + 1} Γ (\frac{μ^{2}}{σ^{2}} + 1)]}^{- 1}$

Equations60

L (θ ∣ k) = Poisson (k; λ (θ)) = \frac{λ ( θ ) ^{k} e ^{- λ (θ)}}{k !},

L (θ ∣ k) = Poisson (k; λ (θ)) = \frac{λ ( θ ) ^{k} e ^{- λ (θ)}}{k !},

L_{AdHoc} (θ ∣ k) = \frac{( \sum _{i} w _{i} ( θ ) ) ^{k} e ^{- (\sum_{i} w_{i} (θ))}}{k !} .

L_{AdHoc} (θ ∣ k) = \frac{( \sum _{i} w _{i} ( θ ) ) ^{k} e ^{- (\sum_{i} w_{i} (θ))}}{k !} .

λ (θ) = j = 1 \sum s \overset{n}{ˉ}_{j} (θ),

λ (θ) = j = 1 \sum s \overset{n}{ˉ}_{j} (θ),

L_{BB} (θ ∣ k) = {\overset{n}{ˉ}_{j}} max \frac{λ ( θ ) ^{k} e ^{- λ (θ)}}{k !} j = 1 \prod s \frac{n ˉ _{j}^{n_{j}} e ^{- \overset{n}{ˉ}_{j}}}{n _{j} !},

L_{BB} (θ ∣ k) = {\overset{n}{ˉ}_{j}} max \frac{λ ( θ ) ^{k} e ^{- λ (θ)}}{k !} j = 1 \prod s \frac{n ˉ _{j}^{n_{j}} e ^{- \overset{n}{ˉ}_{j}}}{n _{j} !},

λ (θ) = j = 1 \sum s η_{j} (θ) \overset{n}{ˉ}_{j},

λ (θ) = j = 1 \sum s η_{j} (θ) \overset{n}{ˉ}_{j},

χ^{2} (θ) = \frac{( k - λ ( θ ) ) ^{2}}{λ ( θ )},

χ^{2} (θ) = \frac{( k - λ ( θ ) ) ^{2}}{λ ( θ )},

χ^{2} (θ) = \frac{( k - λ ( θ ) ) ^{2}}{λ ( θ ) + σ _{syst.}^{2}} .

χ^{2} (θ) = \frac{( k - λ ( θ ) ) ^{2}}{λ ( θ ) + σ _{syst.}^{2}} .

χ_{mod}^{2} (θ) = \frac{( k - λ ( θ ) ) ^{2}}{λ ( θ ) + σ _{syst.}^{2} + σ _{mc}^{2}},

χ_{mod}^{2} (θ) = \frac{( k - λ ( θ ) ) ^{2}}{λ ( θ ) + σ _{syst.}^{2} + σ _{mc}^{2}},

σ_{mc}^{2} (θ) \equiv i = 1 \sum m w_{i} (θ)^{2} .

σ_{mc}^{2} (θ) \equiv i = 1 \sum m w_{i} (θ)^{2} .

L_{General} (θ ∣ k) = \int_{0}^{\infty} \frac{λ ^{k} e ^{- λ}}{k !} P (λ ∣ w (θ)) d λ,

L_{General} (θ ∣ k) = \int_{0}^{\infty} \frac{λ ^{k} e ^{- λ}}{k !} P (λ ∣ w (θ)) d λ,

P (λ ∣ w (θ)) = \frac{L ( λ ∣ w ( θ )) P ( λ )}{\int _{0}^{\infty} L ( λ ^{'} ∣ w ( θ )) P ( λ ^{'} ) d λ ^{'}},

P (λ ∣ w (θ)) = \frac{L ( λ ∣ w ( θ )) P ( λ )}{\int _{0}^{\infty} L ( λ ^{'} ∣ w ( θ )) P ( λ ^{'} ) d λ ^{'}},

μ \equiv i = 1 \sum m w_{i} and σ^{2} \equiv i = 1 \sum m w_{i}^{2}

μ \equiv i = 1 \sum m w_{i} and σ^{2} \equiv i = 1 \sum m w_{i}^{2}

μ = w m, σ^{2} = w^{2} m, w = σ^{2} / μ, and m = μ^{2} / σ^{2} .

μ = w m, σ^{2} = w^{2} m, w = σ^{2} / μ, and m = μ^{2} / σ^{2} .

Poisson (M = m; \overset{m}{ˉ}) = \frac{e ^{- \overset{m}{ˉ}} m ˉ ^{m}}{m !},

Poisson (M = m; \overset{m}{ˉ}) = \frac{e ^{- \overset{m}{ˉ}} m ˉ ^{m}}{m !},

L (λ ∣ w (θ)) = L (λ ∣ μ, σ) = \frac{e ^{- λ μ / σ^{2}} ( λ μ / σ ^{2} ) ^{μ^{2} / σ^{2}}}{( μ ^{2} / σ ^{2} )!},

L (λ ∣ w (θ)) = L (λ ∣ μ, σ) = \frac{e ^{- λ μ / σ^{2}} ( λ μ / σ ^{2} ) ^{μ^{2} / σ^{2}}}{( μ ^{2} / σ ^{2} )!},

μ = w_{Eff} m_{Eff} and σ^{2} = w_{Eff}^{2} m_{Eff},

μ = w_{Eff} m_{Eff} and σ^{2} = w_{Eff}^{2} m_{Eff},

L (\overset{m}{ˉ} ∣ m_{Eff})

L (\overset{m}{ˉ} ∣ m_{Eff})

L (λ ∣ w (θ)) = L (λ ∣ μ, σ) = \frac{e ^{- λ μ / σ^{2}} ( λ μ / σ ^{2} ) ^{μ^{2} / σ^{2}}}{Γ ( μ ^{2} / σ ^{2} + 1 )},

L (λ ∣ w (θ)) = L (λ ∣ μ, σ) = \frac{e ^{- λ μ / σ^{2}} ( λ μ / σ ^{2} ) ^{μ^{2} / σ^{2}}}{Γ ( μ ^{2} / σ ^{2} + 1 )},

E [w_{Eff} M]

E [w_{Eff} M]

= μ,

Var [w_{Eff} M]

Var [w_{Eff} M]

= σ^{2} .

α = \frac{μ ^{2}}{σ ^{2}} + 1 and β = \frac{μ}{σ ^{2}} .

α = \frac{μ ^{2}}{σ ^{2}} + 1 and β = \frac{μ}{σ ^{2}} .

P (λ ∣ w (θ))

P (λ ∣ w (θ))

= \frac{e ^{- λ β} λ ^{α - 1} β ^{α}}{Γ ( α )}

= G (λ; α, β),

L_{Eff} (θ ∣ k)

L_{Eff} (θ ∣ k)

= \frac{β ^{α} Γ ( k + α )}{k ! ( 1 + β ) ^{k + α} Γ ( α )}

= (\frac{μ}{σ ^{2}})^{\frac{μ ^{2}}{σ ^{2}} + 1} Γ (k + \frac{μ ^{2}}{σ ^{2}} + 1) [k! (1 + \frac{μ}{σ ^{2}})^{k + \frac{μ ^{2}}{σ ^{2}} + 1} Γ (\frac{μ ^{2}}{σ ^{2}} + 1)]^{- 1},

α = \frac{μ ^{2}}{σ ^{2}} + a and β = \frac{μ}{σ ^{2}} + b .

α = \frac{μ ^{2}}{σ ^{2}} + a and β = \frac{μ}{σ ^{2}} + b .

m \to \infty lim \frac{σ _{identical}}{μ _{identical}} = m \to \infty lim \frac{1}{m} = 0.

m \to \infty lim \frac{σ _{identical}}{μ _{identical}} = m \to \infty lim \frac{1}{m} = 0.

m \to \infty lim \frac{σ}{μ}

m \to \infty lim \frac{σ}{μ}

P (θ ∣ k) \propto L (θ ∣ k) π (θ),

P (θ ∣ k) \propto L (θ ∣ k) π (θ),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForecasting Techniques and Applications · Statistical Methods and Bayesian Inference · Statistical Methods and Inference

Full text

\NewEnviron

scaletikzpicturetowidth[1]\BODY

aainstitutetext: Dept. of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA bbinstitutetext: Dept. of Physics and Wisconsin IceCube Particle Astrophysics Center, University of Wisconsin, Madison, WI 53706, USA

A binned likelihood for stochastic models

C.A. Argüelles, 111ORCID: 0000-0003-4186-4182 b,2

A. Schneider, 222ORCID: 0000-0002-0895-3477 b,3

T. Yuan, 333ORCID: 0000-0002-7041-5872

[email protected]

Abstract

Metrics of model goodness-of-fit, model comparison, and model parameter estimation are the main categories of statistical problems in science. Bayesian and frequentist methods that address these questions often rely on a likelihood function, which is the key ingredient in order to assess the plausibility of model parameters given observed data. In some complex systems or experimental setups, predicting the outcome of a model cannot be done analytically, and Monte Carlo techniques are used. In this paper, we present a new analytic likelihood that takes into account Monte Carlo uncertainties, appropriate for use in the large and small sample size limits. Our formulation performs better than semi-analytic methods, prevents strong claims on biased statements, and provides improved coverage properties compared to available methods.

Keywords:

Likelihood, Monte Carlo, Poisson distribution

1 Introduction

The use of Monte Carlo (MC) techniques to calculate nontrivial theoretical quantities and expectations in complex experimental settings is common practice in particle physics. A MC event is a single representation of what can be detected in data and is typically generated from a single realization of the underlying physics parameters, $\vec{\theta}_{g}$ . These events are often binned in some observable space and compared with the data. Since the generation process is stochastic, a particular $\vec{\theta}_{g}$ used for generating the MC can lead to different outputs. This stochasticity introduces an uncertainty in the MC distributions. Furthermore, as production of large MC is often time-consuming, reweighting is used to move from one hypothesis to another. In reweighting, each MC event is assigned a new weight, $w(\vec{\theta})$ that accounts for the difference between the generation parameters $\vec{\theta}_{g}$ and the hypothesis parameters $\vec{\theta}$ Gainer:2014bta . It follows that MC uncertainties will be hypothesis dependent; thus, to do hypothesis testing, it is important to account for them. This is especially important for small-signal searches, performed in the small sample limit, where a modified- $\chi^{2}$ may not be suitable Lyons:1986em . A Poisson likelihood is a more appropriate statistical description of event counts poisson1837recherches , but in that case a proper treatment of MC statistical uncertainties is less straightforward. Solutions to this problem have been discussed in the literature in the context of frequentist statistics by adding nuisance parameters Barlow:1993dm ; Cranmer:2012sba ; Chirkin:2013lya , as well as detailed probabilistic treatment of MC weights Glusenkamp:2017rlp . However, Barlow:1993dm ; Chirkin:2013lya ; Glusenkamp:2017rlp add additional time complexity, and Cranmer:2012sba does not provide a full exposition on how to incorporate weighted MC. We present a new treatment that is valid in the large and small limit of the data sample size, suited for frequentist and Bayesian analyses, based on the Poisson likelihood. Our likelihood accounts for statistical uncertainties due to MC, allows for arbitrary event-by-event reweighting, and is computationally efficient. A test statistic based on the proposed likelihood is found to follow a distribution closer to the asymptotic form expected from Wilks’ theorem. An implementation of the likelihood described in this work can be found in MCLLH .

This paper is organized as follows. In Sec. 2 we briefly review two common treatments available in the literature to account for MC statistical uncertainty. In Sec. 3 we define and discuss our new likelihood. In Sec. 4 we study the performance of the likelihood through an example and compare it to other likelihoods in the literature. In Sec. 5 we provide our conclusions. A summary of the likelihoods discussed in the paper, including our main result, is given in Appendix A.

2 The Poisson likelihood and previous work

In order to compare MC with data, events are often binned into distributions across a set of observables. For simplicity we focus on a single bin. In the absence of cross-bin-correlated systematic uncertainties the generalization to multiple bins is simply a product over the likelihood in all bins. This is assumed for the remainder of the paper. It is well known that the count of independent, rare natural processes can be described by the Poisson likelihood, given by

[TABLE]

where $\lambda(\vec{\theta})$ is the expected bin count for a hypothesis and $k$ is the number of observed data events. Equation (1) requires exact knowledge of the expected bin count, $\lambda(\vec{\theta})$ . In the case of complex experiments it is often not possible to obtain $\lambda(\vec{\theta})$ exactly and MC techniques are used to estimate the expected distributions. For weighted MC, often a direct substitution of $\lambda(\vec{\theta})$ by $\sum_{i}{w_{i}(\vec{\theta})}$ is used, where $w_{i}$ are the weights of each of the MC events in the bin. Then Eq. (1) can be approximated as

[TABLE]

This ad hoc treatment assumes that the MC estimate of the expected bin counts exactly matches the true expectation rate of the model, neglecting the stohastic nature of MC. In the case of large MC, Eq. (2) converges to Eq. (1) for the hypothesis given by $\vec{\theta}$ .

2.1 The Barlow-Beeston likelihood

To treat MC statistical uncertainties in the small sample limit, a modification of the Poisson likelihood was introduced in Barlow:1993dm , which is briefly covered below. First, note that the expectation in a single bin is given by contributions from different physical processes, which we index by $j$ . Then, the number of expected events can be written as

[TABLE]

where $\bar{n}_{j}$ is the expected number of MC events from process $j$ that fall in the bin and $s$ is the total number of relevant processes. Substituting Eq. (3) into Eq. (1) gives the Poisson likelihood for observing $k$ data events. For stochastic models, $\bar{n}_{j}$ is unknown. Instead, the MC outcome can be modeled as having drawn $n_{j}$ events from a random process that simulates the physical process. When MC generation is expensive, we can approximate $n_{j}$ as being drawn from a Poisson process with mean $\bar{n}_{j}$ 444The MC generation is a binomial process where we generate a fixed number of events for each process, $N_{j}$ , and accept them into the bin of interest with probability $\beta_{j}(\vec{\theta})$ , such that $\bar{n}_{j}(\vec{\theta})=\beta_{j}(\vec{\theta})N_{j}$ . In the limit of both of rare processes ( $\beta_{j}\ll 1$ ) and large number of generated events ( $N_{j}\gg 1$ ), the total number of observed events can be approximated as Poisson distributed with mean $\lambda(\vec{\theta})=\sum_{j}\beta_{j}(\vec{\theta})N_{j}=\sum_{j}\bar{n}_{j}(\vec{\theta})$ .. Profiling on the true number of MC events per process in the bin results in the Barlow-Beeston (BB) likelihood, given by Barlow:1993dm

[TABLE]

where $\lambda(\vec{\theta})$ is given by Eq. (3), $n_{j}$ and $\bar{n}_{j}$ are the estimated and true MC counts in the bin respectively, and $\{\bar{n}_{j}\}_{j=1}^{s}$ denotes the $s$ nuisance parameters we have profiled over.

In the above formalism we have produced the MC at the natural rate, but this is not the case for weighted MC. The prescription is given by replacing Eq. (3) with

[TABLE]

where $\eta_{j}(\vec{\theta})$ is a scale factor for process $j$ that accounts for the differences in the MC generation and the target hypothesis of interest. In this case, the likelihood definition is still given by Eq. (4); an explicit formula for $s=1$ is given in the appendix. However, for arbitrary weight distributions per physical process $\mathcal{L}_{\textmd{BB}}$ may not be appropriate as it neglects the variance from a sum of weights Barlow:1993dm . It remains valid only in the case where the distribution of weights for each process is narrow.

2.2 Uncertainties in the large-sample limit

In the large-sample regime, the Gaussian distribution is an appropriate description of the observed data. In this limit, the use of Pearson’s $\chi^{2}$ as a test-statistic Pearson:1900 is common practice. For a single analysis bin, Pearson’s $\chi^{2}$ is defined as

[TABLE]

where we continue to use the approximation $\lambda(\vec{\theta})=\sum_{i}{w_{i}(\vec{\theta})}$ and $w_{i}$ are the weights of each of the MC events. The form of Pearson’s $\chi^{2}$ arises from the fact that the Gaussian distribution of $k$ is the large-sample limit of a Poisson distribution for which the expected statistical variance of the observation is given by $\lambda(\vec{\theta})$ . Systematic uncertainties, under the assumption that they follow a Gaussian distribution and are independent between bins, can be included as

[TABLE]

However, this method of incorporating systematic uncertainties tends to overestimate them in shape-only analyses; see Cogswell:2018auu for a recent discussion in the context of reactor neutrino anomalies. Similarly, one can include uncertainties to account for statistical fluctuations of the MC in the test-statistic. In doing so, the Gaussian behavior is implicit and the modified $\chi^{2}$ reads

[TABLE]

where $\sigma^{2}_{\rm mc}$ is the MC statistical uncertainty in the bin given by

[TABLE]

Note that this test-statistic definition is not appropriate in the small-sample regime, as the data is no longer well described by a Gaussian distribution. If one uses a $\chi^{2}$ test-statistic in the small-sample regime, one ought to calculate the test-statistic distribution properly to achieve appropriate coverage cowan1998statistical .

3 Generalization of the Poisson likelihood

Ideally we would like to obtain the expected event count for any hypothesis, $\lambda(\vec{\theta})$ , however we are considering problems where this relationship is not known and $\lambda$ is instead estimated by MC. The key difference here is that instead of using exact knowledge of $\lambda$ we want to perform Bayesian inference to obtain $\mathcal{P}(\lambda|\vec{\theta})$ using the MC available. Assuming the weights are functions of $\vec{\theta}$ , we have

[TABLE]

where the distribution of $\lambda$ , $\mathcal{P}\left(\lambda|\vec{w}(\vec{\theta})\right)$ , is inferred from the MC. The likelihood, $\mathcal{L}_{\textmd{AdHoc}}$ , in Eq. (2) is recovered when $\mathcal{P}(\lambda|\vec{w}(\vec{\theta}))=\delta\left(\lambda-\sum_{i}{w_{i}(\vec{\theta})}\right)$ , but clearly this is an unrealistic assumption as it presumes perfect knowledge of the parameter $\lambda(\vec{\theta})$ from a finite number of realizations. Instead, it is more appropriate to construct $\mathcal{P}(\lambda|\vec{w}(\vec{\theta}))$ based on the MC realization. This is given by

[TABLE]

where $\mathcal{P}(\lambda)$ is a prior on $\lambda$ that must be chosen appropriately and $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ is the likelihood of $\lambda$ given $\vec{w}(\vec{\theta})$ . This is similar to Barlow:1993dm ; Cranmer:2012sba , but instead of fitting $\lambda$ as a nuisance parameter as in $\mathcal{L}_{\textmd{BB}}$ in Eq. (4), we marginalize over it in Eq. (10) as informed by the MC weights. When $\mathcal{L}_{\rm General}$ is used under a frequentist approach, the marginalization over $\lambda$ implies a hybrid Bayesian-frequentist construction, similar to the treatment of nuisance parameters described in Cousins:1991qz and employed in Abe:2017vif ; Abe:2018wpn .

This section is organized as follows. We first derive $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ assuming identical weights in Sec. 3.1, then extend it to arbitrary weights in Sec. 3.2. With this in hand, we calculate an analytic expression for Eq. (10) using Eq. (11) under a uniform $\mathcal{P}(\lambda)$ in Sec. 3.3. In Sec. 3.4 we briefly discuss a family of distributions as possible alternative priors. In Sec. 3.5 we show that our effective likelihood converges to Eq. (2) in the limit of large MC size. Finally, in Sec. 3.6 we provide some intuition on the behavior of our generalized likelihood. Equation (25), along with the definitions of $\mu$ and $\sigma^{2}$ given in Eq. (12), constitute the primary result of this work.

3.1 Derivation of $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ for identical weights

In this section we derive $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ for identical weights. We will show that $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ can be written in terms of two quantities

[TABLE]

for a bin with $m$ MC events.

For identical weights, $w\equiv w_{i}~{}\forall i$ , the following equalities hold:

[TABLE]

Assume that $m$ is the outcome of sampling a Poisson-distributed random variable $M$ with probability mass function

[TABLE]

where $\bar{m}$ is the mean of the distribution. Further, assume that the expected number of data events $\lambda=w\bar{m}$ so that $\bar{m}=\lambda/w$ . Substituting back into Eq. (14), we can interpret $\mathrm{Poisson}(M=m;\bar{m})$ as a likelihood function of $\lambda$

[TABLE]

as $\mu$ and $\sigma$ fully specify $\vec{w}(\vec{\theta})$ for identical weights.

3.2 Extension to arbitrary weights

The derivation above assumed identical weights. For arbitrary weights, $\mu$ is an outcome sampled from a compound Poisson distribution (CPD), which can be approximated by a scaled Poisson distribution (SPD) by matching the first and second moments of the two distributions Bohm:2013gla . In order to make the connection, first rewrite $\mu$ and $\sigma^{2}$ as

[TABLE]

where $m_{\mathrm{Eff}}$ is the effective number of MC events and $w_{\mathrm{Eff}}$ the effective weight. From Eq. (13) these are given by: $m_{\mathrm{Eff}}=\mu^{2}/\sigma^{2}$ and $w_{\mathrm{Eff}}=\sigma^{2}/\mu$ . Next, assume $\bar{m}=\lambda/w_{\mathrm{Eff}}$ and

[TABLE]

where $\lambda$ again is the expected number of events in data. Equation (17) can be written as a likelihood function of $\lambda$ ,

[TABLE]

which is identical to Eq. (15) except the denominator is now a gamma function instead of a factorial. However, since the denominator does not depend on $\lambda$ it cancels out in Eq. (11).

To understand this approximation, note that the maximum likelihood in Eq. (17) occurs when $\bar{m}=m_{\mathrm{Eff}}$ . The first and second moments of the SPD random variable $w_{\mathrm{Eff}}M$ , where $M\sim\mathrm{Poisson}(m_{\mathrm{Eff}})$ , are given by

[TABLE]

and

[TABLE]

This shows that the SPD, under the maximum likelihood solution for the given MC realization, has first and second moments that match the sample mean, $\mu$ , and variance, $\sigma^{2}$ , respectively. These are equal to the first and second moments of the CPD as described in Bohm:2013gla . By assuming that $\mu$ is drawn from a SPD, we can treat $\mu$ and $\sigma$ as outcomes that fix the likelihood function of the underlying scaled expectation $\lambda$ , analogous to the case of identical weights. Because both the first and second moments are matched, this approximation accounts for the variance of the CPD unlike $\mathcal{L}_{\textmd{BB}}$ , which only accounts for the mean. Thus, while $\mathcal{L}_{\textmd{BB}}$ is valid only for the case of narrow weight distributions, our approximation remains valid for broader distributions.

3.3 The effective likelihood

Now that we have an expression for $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ from the MC, we can proceed to compute Eq. (10) under a uniform $\mathcal{P}(\lambda)$ . To simplify the notation, let

[TABLE]

Then, assuming a uniform $\mathcal{P}(\lambda)$ and substituting Eq. (18) for $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ in Eq. (11) we obtain

[TABLE]

where $\Gamma$ is the gamma function and $\mathcal{G}$ the gamma distribution with shape parameter $\alpha$ and inverse-scale parameter $\beta$ . Note that in going from Eq. (18) to Eq. (3.3) $\mu$ and $\sigma^{2}$ go from random variates for a particular $\lambda$ to parameters that govern the probability density of $\lambda$ . With this choice of $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ and $\mathcal{P}(\lambda)$ , we can rewrite $\mathcal{L}_{\rm General}$ from Eq. (10) as

[TABLE]

where $\mu$ and $\sigma^{2}$ depend on $\vec{\theta}$ through $\vec{w}$ .

3.4 A family of likelihoods

It is possible to generalize the choice of $\alpha$ and $\beta$ in Eq. (21) by choosing a particular form of $\mathcal{P}(\lambda)$ . Since the distribution of interest is a Poisson distribution, a well-motivated choice of $\mathcal{P}(\lambda)$ is a gamma distribution (the conjugate prior of the Poisson distribution) Fink97acompendium ; also see Glusenkamp:2017rlp for a recent discussion. Thus we set $\mathcal{P}(\lambda)=\mathcal{G}(\lambda;a,b)$ , where $a$ and $b$ are the shape and inverse-scale parameters of the gamma distribution, respectively. These hyper-parameters dictate the distribution of the Poisson parameter $\lambda$ bernardo2009bayesian . In line with our previous discussion, the gamma distribution prior implies that Eq. (21) becomes

[TABLE]

The rest of the likelihood derivation remains the same. This allows the choice of specific values for $a$ and $b$ to satisfy certain properties. Equation (21) is obtained with $a=1$ and $b=0$ , corresponding to the uniform prior discussed above. Another interesting choice is to require that the mean and variance of $\mathcal{P}(\lambda|\mu,\sigma)$ match $\mu$ and $\sigma^{2}$ , respectively. This can be achieved by setting $a=b=0$ , and we refer to this parameter assignment as $\mathcal{L}_{\textmd{Mean}}$ . In the case of identical weights, $\mathcal{L}_{\textmd{Mean}}$ is equivalent to Eq. (20) in Glusenkamp:2017rlp . Both choices are improper priors, as technically they are limiting cases of the gamma distribution. However, we can use them to obtain proper $\mathcal{P}(\lambda|\mu,\sigma)$ distributions.

In Glusenkamp:2017rlp , a convolutional approach is suggested for handling arbitrary weights. We refer to this likelihood as $\mathcal{L}_{\textmd{G}}$ . Each weighted MC event has $\mathcal{P}(\lambda_{i}|w_{i})=\mathcal{G}(\lambda_{i};1,1/w_{i})$ , corresponding to the prior $\mathcal{P}(\lambda_{i})=\mathcal{G}(\lambda_{i};0,0)$ , such that $\lambda=\sum_{i}^{m}\lambda_{i}$ . The likelihood $\mathcal{L}_{\textmd{Mean}}$ is a good analytic approximation of the more computationally expensive calculation given in Glusenkamp:2017rlp for $\mathcal{L}_{\textmd{G}}$ . The latter has time complexity $\mathcal{O}(k^{2}m)$ where $k$ and $m$ are the number of data and MC events in the bin respectively. When assuming uniform priors, the convolutional approach does not recover Eq. (3.3) for identical weights, so it cannot be used as a generalization of $\mathcal{L}_{\textmd{Eff}}$ .

3.5 Convergence of the effective likelihood

In this section we will show that, if the relative uncertainty of the bin content vanishes as MC size increases, $\mathcal{L}_{\textmd{Eff}}$ and $\mathcal{L}_{\textmd{Mean}}$ both converge to $\mathcal{L}_{\textmd{AdHoc}}$ .

For positive weights $w_{i}$ , the relative uncertainty $\sigma/\mu$ is bounded between zero and one. Uncertainty as large as the estimated quantity, $\sigma/\mu=1$ , occurs if and only if $m=1$ . In the limit that $\sigma/\mu$ goes to zero, Eq. (3.3) converges to $\delta(\lambda-\mu)$ and $\mathcal{L}_{\textmd{Eff}}$ and $\mathcal{L}_{\textmd{Mean}}$ both go to $\mathcal{L}_{\textmd{AdHoc}}$ . We can see this by noting that the shape parameter, $\alpha$ , goes to infinity as the MC relative uncertainty goes to zero, turning the gamma distribution into a Gaussian distribution of mean $\alpha/\beta$ and variance $\alpha/\beta^{2}$ . This Gaussian converges to $\delta(\lambda-\mu)$ in the limit of vanishing $\sigma/\mu$ . Substituting into Eq. (10), we recover Eq. (2), which converges to Eq. (1) in the large MC limit.

It remains to be shown that the relative uncertainty of the bin content vanishes as MC size increases. For identical weights,

[TABLE]

For arbitrary weights, the limit can be written in terms of the running average of $w_{i}$ and $w_{i}^{2}$ as

[TABLE]

where $\langle w\rangle_{m}$ is the average over $w_{i}$ and $\langle w^{2}\rangle_{m}$ the average over $w_{i}^{2}$ for $i\leq m$ . This shows that as long as $\langle w^{2}\rangle_{m}$ does not grow much faster than $\langle w\rangle_{m}^{2}$ , the limit will converge to zero. For weight distributions with positive support and finite, non-zero mean, this should be the case.

3.6 Behavior of the effective likelihood

It is instructive to examine the behavior of $\mathcal{L}_{\textmd{Eff}}$ for a single bin. It is standard to work with the log-likelihood $l(\mu,\sigma|k)\equiv-2\ln\mathcal{L}(\mu,\sigma|k)$ and we do so here. Figure 1 shows the contour lines for $l_{\textmd{Eff}}(\mu,\sigma|k=100)$ . Since $\mu$ and $\sigma$ are both dependent on the same underlying parameters, $\vec{\theta}$ , a minimization over $\vec{\theta}$ can be thought of as a constrained minimization over $\mu$ and $\sigma$ . This is visualized as the gray region in Fig. 1, which indicates where $\mu$ and $\sigma$ are allowed to vary for some physics model555A general bound for positive weights is $\sigma\leq\mu\leq\sigma\sqrt{m}$ which can be seen from their definitions.. Similarly, we can also visualize the standard Poisson log-likelihood, $l_{\textmd{Poisson}}(\mu|k=100)$ , which is simply $l_{\textmd{Eff}}$ constrained along the line $\sigma=0$ .

To further illustrate the effect of the accessible region, we minimize $l_{\textmd{Eff}}$ over $\mu$ for two possible constraints: fixed $\sigma$ and fixed $\sigma/\mu$ . In terms of Eq. (12), a sufficient but not necessary condition for constant $\sigma/\mu$ with varying $\mu$ is equal weights, and a necessary but not sufficient condition for constant $\sigma$ with varying $\mu$ is $m\geq 2$ . For a standard Poisson likelihood, $\hat{\mu}_{\textmd{Poisson}}\equiv\min_{\mu}l_{\textmd{Poisson}}(\mu|k)=k$ . Figure 2 shows $\hat{\mu}_{\textmd{Eff}}\equiv\min_{\mu}l_{\textmd{Eff}}(\mu,\sigma|k=100)$ as well as the region where $l_{\textmd{Eff}}(\mu,\sigma|k)-l_{\textmd{Eff}}(\hat{\mu},\sigma|k)<1$ for fixed $\sigma$ (left) and fixed $\sigma/\mu$ (right). Note that the shaded regions for fixed $\sigma$ are calculated without requiring that $\mu\geq\sigma$ , which would be the case for Eq. (12). As $\sigma$ goes to zero, the Poisson best-fit and Wilks’ $1\sigma$ interval are recovered. As $\sigma$ or $\sigma/\mu$ increases, the shaded region becomes wider, as expected. For fixed $\sigma$ , $\hat{\mu}_{\textmd{Eff}}$ does not deviate much from $\hat{\mu}_{\textmd{Poisson}}$ , while for fixed $\sigma/\mu$ , $\hat{\mu}_{\textmd{Eff}}$ deviates from $\hat{\mu}_{\textmd{Poisson}}$ as $\sigma/\mu$ increases. The shaded regions correspond to the $1\sigma$ interval assuming the approximation from Wilks’ theorem and give a sense of the shape of $\mathcal{L}_{\textmd{Eff}}$ projected onto one-dimensional slices.

4 Example and performance

In practice, likelihoods such as those discussed above are used to estimate physical parameters from data. As discussed in Sec. 1, weighted MC is often used to compute the likelihood of a particular physical scenario given the observed data. Statements are then made about the physical scenarios either by maximizing the likelihood or by examining the posterior distribution assuming some priors. We examine a toy experiment where we measure the mode, $\Omega$ , and normalization, $\Phi$ , of a Gaussian-distributed signal against a steeply falling inverse power-law background. The performance of $\mathcal{L}_{\textmd{Eff}}$ is evaluated and compared against other likelihoods.

For our toy experiment, we generate the true energies, $E_{t}$ , of synthetic data events from a background falling as $(E_{t}/100\mathrm{GeV})^{-\gamma_{t}^{b}}$ , where $\gamma_{t}^{b}=3.07$ , and a Gaussian signal centered at $\Omega_{t}=125$ GeV with width of $\sigma_{t}=2$ GeV and normalization $\Phi_{t}=5013$ for a fixed number of expected events. Our imaginary detector is sensitive in the 100–160 GeV range. To simulate the effect of a real detector, the true energy, $E_{t}$ , is smeared by 5% for background and 3% for signal to obtain event-by-event reconstructed energies, $E_{r}$ . We generate a total number of MC events, $N_{\mathrm{MC}}$ , split evenly between the components. Generation is performed assuming inverse power-law distributions of $(E_{t}/100\mathrm{GeV})^{-\gamma_{g}}$ for signal and $(E_{t}/100\mathrm{GeV})^{-\gamma_{g}^{b}}$ for background. We choose $\gamma_{g}=1$ and $\gamma_{g}^{b}=2$ . Reweighting of the MC can then be performed as a function of $E_{t}$ and forward-folded onto distributions in $E_{r}$ over which the events are histogrammed and likelihoods evaluated. A diagram of the steps described above is shown in Fig. 3. For all toy experiments, the background component, $(\Phi^{b},\gamma^{b})$ , and the signal width, $\sigma$ , are kept fixed to their true values. Only the signal mean, $\Omega$ , and normalization, $\Phi$ , are treated as free parameters.

4.1 Point estimation

Figure 4 shows the expectation in $E_{t}$ as well as the data and $\mathcal{L}_{\textmd{Eff}}$ best-fit distributions in $E_{r}$ . The leftmost panel shows the expectation for both signal and background assuming no smearing in $E_{t}$ . The three other panels show the smeared, $E_{r}$ , distribution for data (black) and the best-fit result from $\mathcal{L}_{\textmd{Eff}}$ for three different MC datasets (orange) of varying MC size. The smeared shape of the signal peak is clearly visible in data, but not in the smallest size MC. As the MC increases in size, the best-fit MC can be seen to converge to data.

The best-fit values for the example shown in Fig. 4 are given in Table 1 for $\mathcal{L}_{\textmd{Eff}}$ and $\mathcal{L}_{\textmd{AdHoc}}$ . As point estimators, both likelihoods return similar values. This is driven by the fact that the same underlying MC distribution is used to fit to the data. The effect of convoluting $\mathcal{P}(\lambda|\vec{w}(\vec{\theta}))$ mostly serves to broaden the likelihood space, while preserving the maximum within the constraints described in Sec. 3.6. In the large MC limit, both likelihoods can be used for unbiased point estimation, provided that the likelihood space is smooth enough for standard minimization techniques to probe the global minimum.

4.2 Coverage

Due to the higher computational cost of computing frequentist confidence intervals by generating pseudodata to estimate the test-statistic ( $\mathcal{TS}$ ) distribution, it is common to use the approximation given by Wilks’ theorem for the cases where the underlying hypotheses hold. In the case of small MC, a likelihood description that neglects MC uncertainties may lead to undercoverage even for a large data sample. In this section, we will use $\mathcal{TS}=\Delta l=l(\vec{\theta}_{\rm true})-l(\hat{\vec{\theta}})$ , where $\vec{\theta}_{\rm true}$ and $\hat{\vec{\theta}}$ correspond to the true and best-fit $(\Omega,\Phi)$ , respectively. We evaluate the coverage properties, computed using the asymptotic approximation given by Wilks’ theorem, of the two-dimensional fit over $(\Omega,\Phi)$ for several likelihood constructions. These include the modified- $\chi^{2}$ , $\mathcal{L}_{\textmd{AdHoc}}$ , $\mathcal{L}_{\textmd{BB}}$ , $\mathcal{L}_{\textmd{Mean}}$ , and $\mathcal{L}_{\textmd{Eff}}$ . These five test-statistics were chosen on the basis of their computation speed and as tests of different approaches towards the treatment of weighted MC. Note that using Wilks’ theorem is an approximation and in general we encourage the reader to perform coverage tests for their own particular setup.

Several configurations were tested, all under the assumptions of the toy experiment described in Sec. 4.1. The MC was generated for two different settings of the total number of events: $10^{3}$ and $10^{6}$ . For each setting, 500 toy experiments were generated, their best-fits found, and their $\Delta l$ evaluated. Each toy experiment was classified as covering $\vec{\theta}_{\rm true}$ at a specified level $p$ if $\Delta l<I(p;2)$ , where $I$ is the inverse of the $\chi^{2}$ cumulative density function and $2$ indicates the number of degrees of freedom.

Figure 5 shows the percentage of times the true parameters were within the confidence intervals at level $p$ as a function of the estimated coverage percentile for that level. First note that, as expected, the true coverage is highly dependent on MC size, with higher MC size leading towards improved agreement. In the case of $N_{\mathrm{MC}}=10^{3}$ , $\mathcal{L}_{\textmd{BB}}$ , $\mathcal{L}_{\textmd{Mean}}$ , modified- $\chi^{2}$ , and $\mathcal{L}_{\textmd{AdHoc}}$ all undercover to varying degrees of severeness. For $N_{\mathrm{MC}}=10^{6}$ , $\mathcal{L}_{\textmd{AdHoc}}$ still undercovers, which is not surprising as it presumes zero MC uncertainty, but the other likelihoods exhibit good agreement. In this benchmark test, $\mathcal{L}_{\textmd{Eff}}$ exhibits the best coverage properties. However, note that using Wilks’ theorem in order to evaluate confidence intervals implies an asymptotic approximation. In general, this approximation does not necessarily have to hold and we encourage the reader to always perform their own coverage tests suitable for their particular experimental setup.

4.3 Posterior distributions

It is also possible to use $\mathcal{L}_{\textmd{Eff}}$ in a Bayesian approach. Using Bayes’ theorem, the posterior

[TABLE]

where $\pi(\vec{\theta})$ is a prior on the parameters. As evaluation of the normalization factor can by challenging, $\mathcal{P}(\vec{\theta}|k)$ can be approximated using a Markov Chain Monte Carlo (MCMC). For our toy example, we used emcee ForemanMackey:2012ig to sample $\mathcal{P}(\vec{\theta}|k)$ under a uniform box prior for two different likelihood functions: $\mathcal{L}_{\textmd{Eff}}$ and $\mathcal{L}_{\textmd{AdHoc}}$ . The sampling was performed using the data and MC sets described in Sec. 4.1.

Figure 6 shows the posterior distributions of $\Omega$ and $\Phi$ . For each comparison, $\mathcal{L}_{\textmd{Eff}}$ (blue) and $\mathcal{L}_{\textmd{AdHoc}}$ (orange) were sampled using the same underlying data and MC. We used 20 walkers with 300 burn-in steps followed by 1000 steps as settings for emcee. The left and center column show the marginal posterior distribution for the mass, $\Omega$ , and normalization, $\Phi$ , respectively. The true value is indicated by the dashed, vertical line. The rightmost column shows the joint posterior distribution with 68% (solid) and 95% (dashed) contours. The true values are indicated by the star. With $\mathcal{L}_{\textmd{AdHoc}}$ , the true value of the parameter is highly improbable for the lower MC-size cases of the top and middle rows. In contrast, the posterior evaluated using $\mathcal{L}_{\textmd{Eff}}$ has increased width due to the reduced MC size. Even for $N_{\mathrm{MC}}=10^{6}$ (bottom row), the shape of the posterior evaluated using $\mathcal{L}_{\textmd{AdHoc}}$ is narrower than that using $\mathcal{L}_{\textmd{Eff}}$ . Credible regions estimated using $\mathcal{L}_{\textmd{AdHoc}}$ would bias the result.

4.4 Performance

In this section we compare our performance with other treatments available in the literature in terms of the runtime cost per likelihood evaluation for a single bin. We perform our tests using a single Intel® Core™ i5-8350U CPU @ 1.70GHz running code compiled with clang version 6.0.0-1ubuntu2. We compute the likelihood CPU-evaluation time for the following likelihoods: $\mathcal{L}_{\textmd{AdHoc}}$ , modified- $\chi^{2}$ , $\mathcal{L}_{\textmd{G}}$ Glusenkamp:2017rlp , $\mathcal{L}_{\textmd{BB}}$ Barlow:1993dm , and $\mathcal{L}_{\textmd{Eff}}$ . For each of them we consider increasing number of MC events from $10^{2}$ to $10^{6}$ , increasing number of background components from $1$ to $10^{3}$ , and increasing counts of data events from $10^{1}$ to $10^{4}$ . Figure 7 shows the behavior of the runtime with respect to these quantities. All likelihoods have runtime that increases with the number of MC events, as seen in the leftmost panel of Fig. 7, as each likelihood must compute the sum of event weights which incurs an $\mathcal{O}(m)$ cost, where $m$ is the number of MC events in the bin. Additionally at low MC sample sizes the modified- $\chi^{2}$ is faster than $\mathcal{L}_{\textmd{Eff}}$ since $\mathcal{L}_{\textmd{Eff}}$ requires the evaluation of more expensive special functions, however at larger MC sample sizes this additional cost is negligible compared to that of summing the MC weights. In the middle panel of Fig. 7 it can be seen that all likelihoods except $\mathcal{L}_{\textmd{BB}}$ are constant with respect to the number of background components as they only depend on summary statistics of the weight distribution. The Barlow-Beeston likelihood, $\mathcal{L}_{\textmd{BB}}$ , incurs an $\mathcal{O}(bd\log d)$ cost for solving a single root finding problem per physical component, where $b$ is the number of background components and $d$ is the number of digits of precision, and therefore is not constant in runtime with respect to the number of components. However, one key difference between $\mathcal{L}_{\textmd{BB}}$ and $\mathcal{L}_{\textmd{Eff}}$ is that $\mathcal{L}_{\textmd{Eff}}$ must compute two summations (the sum of the weights and sum of the square weights), while $\mathcal{L}_{\textmd{BB}}$ needs only to compute a single summation of the MC weights. The rightmost panel of Fig. 7 shows the runtime as a function of the number of data events; for most likelihoods the number of data events, $k$ , enters only in the evaluation of some special functions which for all practical applications are approximately constant in runtime. $\mathcal{L}_{\textmd{G}}$ evaluates a special function which for these purposes can only be computed in $\mathcal{O}(k^{2}m)$ time, resulting in the dependence on the number of data events. The $\mathcal{L}_{\textmd{AdHoc}}$ treatment is always the fastest, but it does not incorporate MC statistical uncertainties in any way.

5 Conclusion

The use of MC to estimate expected outcomes of physical processes is nowadays standard practice. By construction, MC distributions are sample observations and subject to statistical fluctuations. MC events are also typically weighted to a particular physics model, and these weights may not be uniform across all events in an observable bin. A direct comparison of MC distributions to data is typically performed using $\mathcal{L}_{\textmd{AdHoc}}$ or $\chi^{2}$ , where the expectation from MC is computed as a sum over weights in a particular observable bin. Such likelihoods neglect the intrinsic MC fluctuations and may lead to vastly underestimated parameter uncertainties in the case of low MC size. A better approach is to use a likelihood that accounts for MC statistical uncertainties.

Along with the definitions of $\mu$ and $\sigma^{2}$ in Eq. (12), the main result of this work is given in Eq. (25). This new $\mathcal{L}_{\textmd{Eff}}$ is motivated by treating the MC realization as an observation of a Poisson random variate, computing the likelihood of the expectation using the MC and marginalizing the Poisson probability of observed data over all possible expectations. It is an analytic extension of the Poisson likelihood that accounts for MC statistical uncertainty under a uniform prior, $\mathcal{P}(\lambda)$ . By assuming that the number of MC events per bin is the outcome of sampling a Poisson-distributed random variable, and that the SPD is a good approximation of the CPD for arbitrary weights, $\mathcal{L}\left(\lambda|\vec{w}(\vec{\theta})\right)$ can be written in terms of $\mu$ and $\sigma^{2}$ as shown in Eq. (18). This allows us to calculate $\mathcal{L}_{\textmd{Eff}}$ , given in Eq. (25), and can be directly substituted in favor of $\mathcal{L}_{\textmd{AdHoc}}$ . Our construction is computationally efficient, exhibits proper limiting behavior, and has excellent coverage properties. In our tests, it outperforms other treatments of MC statistical uncertainty.

Acknowledgements

We thank Thorsten Glüsenkamp for useful discussions and Jean DeMerit for proofreading an early draft. CAA is supported by U.S. National Science Foundation (NSF) grant PHY-1505858. AS and TY are supported in part by NSF grant PHY-1607644 and by the University of Wisconsin Research Committee with funds granted by the Wisconsin Alumni Research Foundation.

Appendix A Summary of likelihood formulas

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) J. S. Gainer, J. Lykken, K. T. Matchev, S. Mrenna and M. Park, Exploring Theory Space with Monte Carlo Reweighting , JHEP 10 (2014) 078 [ 1404.7129 ]. · doi ↗
2(2) L. Lyons, STATISTICS FOR NUCLEAR AND PARTICLE PHYSICISTS . 1986.
3(3) S. D. Poisson, Recherches sur la probabilité des jugements en matière criminelle et en matière civile precédées des règles générales du calcul des probabilités . Bachelier, 1837.
4(4) R. J. Barlow and C. Beeston, Fitting using finite Monte Carlo samples , Comput. Phys. Commun. 77 (1993) 219 . · doi ↗
5(5) K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, Hist Factory: A tool for creating statistical models for use with Roo Fit and Roo Stats , .
6(6) D. Chirkin, Likelihood description for comparing data with simulation of limited statistics , 1304.0735 .
7(7) T. Glüsenkamp, Probabilistic treatment of the uncertainty from the finite size of weighted Monte Carlo data , Eur. Phys. J. Plus 133 (2018) 218 [ 1712.01293 ]. · doi ↗
8(8) C. Argüelles, A. Schneider and T. Yuan, “ MCLLH .” https://github.com/austinschneider/MCLLH , 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

A binned likelihood for stochastic models

Abstract

Keywords:

1 Introduction

2 The Poisson likelihood and previous work

2.1 The Barlow-Beeston likelihood

2.2 Uncertainties in the large-sample limit

3 Generalization of the Poisson likelihood

3.1 Derivation of L(λ∣w⃗(θ⃗))\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))L(λ∣w(θ)) for identical weights

3.2 Extension to arbitrary weights

3.3 The effective likelihood

3.4 A family of likelihoods

3.5 Convergence of the effective likelihood

3.6 Behavior of the effective likelihood

4 Example and performance

4.1 Point estimation

4.2 Coverage

4.3 Posterior distributions

4.4 Performance

5 Conclusion

Acknowledgements

Appendix A Summary of likelihood formulas

3.1 Derivation of $\mathcal{L}(\lambda|\vec{w}(\vec{\theta}))$ for identical weights