Inverse Problems in PDF Determinations

Alessandro Candido; Luigi Del Debbio; Tommaso Giani; Giacomo Petrillo

arXiv:2302.14731·hep-lat·March 1, 2023

Inverse Problems in PDF Determinations

Alessandro Candido, Luigi Del Debbio, Tommaso Giani, Giacomo Petrillo

PDF

Open Access

TL;DR

This paper introduces a Bayesian framework to address the challenging inverse problem of determining Parton Distribution Functions from limited data, emphasizing uncertainty quantification and initial testing results.

Contribution

It presents a novel Bayesian approach for PDF determination and provides initial results demonstrating its effectiveness in solving inverse problems.

Findings

01

Bayesian method successfully applied to PDF determination

02

Initial closure test results show promising accuracy

03

Framework enhances uncertainty quantification in inverse problems

Abstract

The determination of Parton Distribution Functions from a finite set of data is a typical example of an inverse problem. Inverse problems are notoriously difficult to solve, in particular when a robust determination of the uncertainty in the result is needed. We present a Bayesian framework to deal with this problem and discuss first results from a closure test.

Equations89

y_{I} = \int_{0}^{1} d x C_{I j} (x) f_{j} (x),

y_{I} = \int_{0}^{1} d x C_{I j} (x) f_{j} (x),

y_{I} = j = 1 \sum N (FK)_{I j} f (x_{j}), j = 1, \dots, N

y_{I} = j = 1 \sum N (FK)_{I j} f (x_{j}), j = 1, \dots, N

f \sim G P (m, k),

f \sim G P (m, k),

E [f (x)]

E [f (x)]

E [f (x^{'})]

Cov [f (x), f (x^{'})]

x = x_{1} ⋮ x_{N}, x^{*} = x_{1}^{*} ⋮ x_{M}^{*},

x = x_{1} ⋮ x_{N}, x^{*} = x_{1}^{*} ⋮ x_{M}^{*},

f = f (x) \in R^{N}, f^{*} = f (x^{*}) \in R^{M},

f = f (x) \in R^{N}, f^{*} = f (x^{*}) \in R^{M},

(m m^{*})

(m m^{*})

Σ = (k (x, x^{T}) k (x^{*}, x^{T}) k (x, x^{*}^{T}) k (x^{*}, x^{*}^{T})) = (Σ_{xx} Σ_{x^{*} x} Σ_{x x^{*}} Σ_{x^{*} x^{*}}) .

Σ = (k (x, x^{T}) k (x^{*}, x^{T}) k (x, x^{*}^{T}) k (x^{*}, x^{*}^{T})) = (Σ_{xx} Σ_{x^{*} x} Σ_{x x^{*}} Σ_{x^{*} x^{*}}) .

y = f (x),

y = f (x),

(f^{*} ∣ f = y) \sim N (\tilde{m}^{*}, \tilde{Σ}_{x^{*} x^{*}}),

(f^{*} ∣ f = y) \sim N (\tilde{m}^{*}, \tilde{Σ}_{x^{*} x^{*}}),

\tilde{m}^{*}

\tilde{m}^{*}

\tilde{Σ}_{x^{*} x^{*}}

ϵ \sim N (0, C_{Y}),

ϵ \sim N (0, C_{Y}),

y = (FK) f + ϵ,

y = (FK) f + ϵ,

Cov = (Σ 0 0 C_{Y}),

Cov = (Σ 0 0 C_{Y}),

(f ∣ (FK) f + ϵ = y) \sim N (\tilde{m}, \tilde{Σ}_{xx}),

(f ∣ (FK) f + ϵ = y) \sim N (\tilde{m}, \tilde{Σ}_{xx}),

\tilde{m}

\tilde{m}

\tilde{Σ}_{xx}

\tilde{Σ}_{xx}^{- 1} = Σ_{xx}^{- 1} + (FK)^{T} C_{Y}^{- 1} (FK) .

\tilde{Σ}_{xx}^{- 1} = Σ_{xx}^{- 1} + (FK)^{T} C_{Y}^{- 1} (FK) .

(f^{*} ∣ (FK) f + ϵ = y) \sim N (\tilde{m}^{*}, \tilde{Σ}_{x^{*} x^{*}}^{lat}),

(f^{*} ∣ (FK) f + ϵ = y) \sim N (\tilde{m}^{*}, \tilde{Σ}_{x^{*} x^{*}}^{lat}),

\tilde{m}^{*}

\tilde{m}^{*}

\tilde{Σ}_{x^{*} x^{*}}^{lat}

Δ m

Δ m

Δ m^{*}

Δ m^{*} = Σ_{x^{*} x} Σ_{xx}^{+} Δ m,

Δ m^{*} = Σ_{x^{*} x} Σ_{xx}^{+} Δ m,

\tilde{Σ}_{x^{*} x^{*}}^{lat} = \tilde{Σ}_{x^{*} x^{*}} + Σ_{x^{*} x} Σ_{xx}^{+} \tilde{Σ}_{xx} Σ_{xx}^{+} Σ_{x x^{*}} .

\tilde{Σ}_{x^{*} x^{*}}^{lat} = \tilde{Σ}_{x^{*} x^{*}} + Σ_{x^{*} x} Σ_{xx}^{+} \tilde{Σ}_{xx} Σ_{xx}^{+} Σ_{x x^{*}} .

(f ∣ θ) \sim G P (m_{θ}, k_{θ}), θ \sim p_{θ} .

(f ∣ θ) \sim G P (m_{θ}, k_{θ}), θ \sim p_{θ} .

p (θ ∣ f = y) \propto p (f = y ∣ θ) p_{θ} (θ) .

p (θ ∣ f = y) \propto p (f = y ∣ θ) p_{θ} (θ) .

\displaystyle p(\mathbf{f}=\mathbf{y}|\theta)=\bigl{[}\det 2\pi\Sigma_{xx}(\theta)\bigr{]}^{-1/2}\exp\left(-\frac{1}{2}(\mathbf{y}-\mathbf{m}(\theta))^{T}\Sigma_{xx}(\theta)^{-1}(\mathbf{y}-\mathbf{m}(\theta))\right)\,,

\displaystyle p(\mathbf{f}=\mathbf{y}|\theta)=\bigl{[}\det 2\pi\Sigma_{xx}(\theta)\bigr{]}^{-1/2}\exp\left(-\frac{1}{2}(\mathbf{y}-\mathbf{m}(\theta))^{T}\Sigma_{xx}(\theta)^{-1}(\mathbf{y}-\mathbf{m}(\theta))\right)\,,

\displaystyle p\bigl{(}\theta|(\mathrm{FK})\mathbf{f}+\epsilon=\mathbf{y}\bigr{)}

\displaystyle p\bigl{(}\theta|(\mathrm{FK})\mathbf{f}+\epsilon=\mathbf{y}\bigr{)}

\displaystyle=\int d\mathbf{f}\,d\mathbf{\epsilon}\,p\bigl{(}(\mathrm{FK})\mathbf{f}+\epsilon=\mathbf{y}|\mathbf{f},\epsilon,\theta\bigr{)}p(\mathbf{f},\epsilon|\theta)p_{\theta}(\theta)\,.

p (f^{*}, θ ∣ f = y)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParticle physics theoretical and experimental studies

Full text

[b]Luigi Del Debbio

Inverse Problems in PDF Determinations

Alessandro Candido

Tommaso Giani

Giacomo Petrillo

Abstract

The determination of Parton Distribution Functions from a finite set of data is a typical example of an inverse problem. Inverse problems are notoriously difficult to solve, in particular when a robust determination of the uncertainty in the result is needed. We present a Bayesian framework to deal with this problem and discuss first results from a closure test.

1 Introduction

The determination of Parton Distribution Functions (PDFs) from lattice simulations requires the solution of an inverse problem, where a function of a real variable $f(x)$ needs to be reconstructed from a finite set of data $\{y_{I};I=1,\ldots,N_{\mathrm{dat}}\}$ . Following the original ideas in Ref. [1], the lattice data are correlators of fields, which are computed in Monte Carlo simulations. Factorization theorems allow us to express these correlators as convolutions of the PDFs,

[TABLE]

where $C_{Ij}(x)$ are Wilson coefficients that can be obtained in perturbation theory, and the index $j$ is summed over all partons described by the PDFs; see, e.g., Ref. [2] for a recent determination of PDFs from experimental data and Ref. [3] for a (simple) discussion of the factorization formula for lattice data in a toy model scalar field theory.

We argued in Ref. [4] that a Bayesian approach is best suited to characterise the knowledge about the function $f$ , taking into account any prior assumptions and the existing data. In that paper, the integral in Eq. 1 was evaluated as a discrete sum on a grid of points $x_{j}$ ,

[TABLE]

and a multivariate Gaussian with covariance $C_{X}$ was chosen as a prior for the discrete set of values $f(x_{j})$ . Given the set of data, Bayes’ theorem provides the posterior distribution for the variables $f(x_{j})$ .

In this work, we are going to consider the ‘continuum limit’ of this approach, where the function $f$ is treated as a Gaussian Process (GP). GPs are characterized by a mean function $m(x)$ and a covariance function $k(x,x^{\prime})$ ,

[TABLE]

so that $x$ can be thought of as an index, and $f(x)$ as a stochastic variable indexed by $x$ . For any pair of indices $x$ and $x^{\prime}$ , the stochastic variables $f(x)$ and $f(x^{\prime})$ are Gaussian variables, with

[TABLE]

Note that the kernel $k$ , by determining the correlation between the values of $f$ at different values of $x$ , encodes somehow the smoothness of the function.

2 Inference Using Gaussian Processes

Before applying GPs to the determination of PDFs, let us briefly summarise their general usage for inference purposes. When trying to determine a function $f$ , we are going to assume that the prior distribution of the function is given by a GP, as specified in Eq. 3. Let us now consider two sets of indices,

[TABLE]

we can costruct two vectors, which contain the values of $f$ evaluated at the points $\mathbf{x}$ and $\mathbf{x^{*}}$ respectively,

[TABLE]

where the expressions above are a short-hand for $f_{i}=f(x_{i})$ and $f^{*}_{i}=f(x^{*}_{i})$ respectively. The $(N+M)$ -dimensional vector $\displaystyle\begin{pmatrix}\mathbf{f}\\ \mathbf{f^{*}}\end{pmatrix}$ is a stochastic variable that is distributed according to the prior Gaussian distribution, with mean 111The notation here and in Eq. 10 is a generalization of the one explained in Eq. 8

[TABLE]

and covariance

[TABLE]

When considering a vector $\mathbf{f}$ containing a finite number of variables, this is exactly the formalism used in Ref. [4]; the only difference being that, in the case of a GP, the mean and the variance are dictated by the functions $m$ and $k$ that characterise the GP, as shown in Eq. 10.

In what follows, we are going to consider two different examples where Bayesian inference can be used.

We will first consider the case where the values of $\mathbf{f}$ are exactly known, and we will use this information to construct a posterior distribution for $\mathbf{f^{*}}$ . We will refer to this case as point-wise data. 2. 2.

We will then analyse the case where the function $f$ itself is not known, but we are given some data like the datapoints $y_{I}$ discussed above, i.e., data that is obtained as a linear transformation of the values $\mathbf{f}$ . In this scenario, we will compute the posterior distribution for the values of $\mathbf{f}$ and $\mathbf{f^{*}}$ given the data $y_{I}$ and their covariance matrix. We will refer to this scenario as lattice data.

Point-wise data.

It is interesting to remark that, because of the correlation between the values of $f$ at different values of $x$ , knowing the values of $\mathbf{f}$ yields some information on the values of $\mathbf{f^{*}}$ . In a Bayesian framework, this information is extracted by computing the posterior distribution for the stochastic variables $\mathbf{f^{*}}$ . Let us assume that the values of $f$ on the $\mathbf{x}$ index set are given by a vector $\mathbf{y}$ ,

[TABLE]

a standard calculation, using the Schur complement of $\Sigma_{xx}$ , yields the posterior distribution for the values of $f$ on the $\mathbf{x^{*}}$ set,

[TABLE]

where

[TABLE]

where $\Sigma_{xx}^{+}$ denotes the Moore-Penrose pseudoinverse of $\Sigma_{xx}$ .

Lattice data.

As discussed above, the lattice data, with their Monte Carlo covariance, are typically expressed as linear functions of the PDFs. The knowledge of a variable that depends linearly on the GP can be incorporated in our formalism by introducing the stochastic variable

[TABLE]

and imposing that

[TABLE]

where $\mathbf{y}$ are the central values and $C_{Y}$ is the covariance matrix of the lattice estimates. The linear dependence of $\mathbf{y}$ on $\mathbf{f}$ is encoded in the matrix $(\mathrm{FK})$ . Note that in Eq. 16 we assume that the observables, defined in Eq. 2, are computed using the function $f$ evaluated on the points in $\mathbf{x}$ . Prior knowledge of the function and the Monte Carlo covariance of the data must be uncorrelated, and therefore the covariance matrix of the three sets of stochastic variables $(\mathbf{f},\mathbf{f^{*}},\mathbf{\epsilon})$ is a block-diagonal $(N+M+N_{\mathrm{dat}})\times(N+M+N_{\mathrm{dat}})$ matrix

[TABLE]

where $\Sigma$ is the $(N+M)\times(N+M)$ matrix introduced in Eq. 10.

Conditioning on the observed value $y$ in Eq. 16, and marginalizing $\mathbf{f^{*}}$ and $\mathbf{\epsilon}$ , yields a Gaussian posterior for $\mathbf{f}$ ,

[TABLE]

with mean $\tilde{m}$ and covariance $\tilde{\Sigma}_{xx}$ , given by

[TABLE]

Note that Eq. 20 can equivalently be written as

[TABLE]

Eqs. 19 and 21 were already obtained in Ref. [4], while Eq. 20 provides an alternative expression for the posterior covariance.

A similar derivation yields a multivariate Gaussian for the posterior distribution of $\mathbf{f^{*}}$ ,

[TABLE]

where in this case

[TABLE]

Introducing the corrections to the mean of the process due to Bayesian inference,

[TABLE]

we have

[TABLE]

and

[TABLE]

3 Hyperparameters

In what follows, the mean value and the kernel of the GPs depend on a set of hyperparameters that we denote by $\theta$ , with their prior distribution, $p_{\theta}(\theta)$ . The $\theta$ -dependent mean and kernel being henceforth denoted $m_{\theta}(x)$ and $k_{\theta}(x,x^{\prime})$ , we define, in analogy with Eq. (3),

[TABLE]

Every instance of the variable $\theta$ fully defines the process $f$ . The hyperparameters and their probability distribution become part of our Bayesian analysis. The posterior distribution for the hyperparameters, given the point-wise values $\mathbf{f}$ , is simply

[TABLE]

Note that the first factor on the right-hand side of Eq. (28) is

[TABLE]

where we have written explicitly the dependence of $\mathbf{m}$ and $\Sigma_{xx}$ on the hyperparameters $\theta$ . Similarly, mirroring the discussion in the previous Section, the posterior probability of the hyperparameters can be computed in the case where we know the central value and the covariance of lattice data that are linear functions of the Gaussian process,

[TABLE]

We will also use the joint probability distribution of the values of the function at the unseen points, $\mathbf{f^{*}}$ and the hyperparameters. In the case of the point-wise data we obtain

[TABLE]

The first factor on the right-hand side has been computed in Eq. (12), while for the second factor we can use the expression above in Eq. (28). Finally, let us consider the case where the data is the usual lattice data, as described in Section 2. The joint posterior distribution in this case is

[TABLE]

For completeness, we remind the reader that the first factor on the right-hand side has been computed in Eq. 22, while the second factor is in Eq. 3.

4 Numerical Implementation

In these proceedings, we are going to focus on some preliminary ‘closure tests’ of the method, where we generate artificial data from some known, input PDFs and then use Bayesian inference to reconstruct the PDFs using the data. At this stage, the input PDFs do not need to be realistic as we are mainly interested in testing the methodology. We are going to use a set of these artificial PDFs in the evolution basis 222see e.g. Ref. [2] for their definition in terms of the PDFs in the flavor basis. The singlet PDF is traditionally denoted by $\Sigma$ and should not be confused with the covariance matrix introduced above.,

[TABLE]

to generate 3000 data points using a fixed set of FK tables. These data are then used to constrain the GPs, as dictated by the inference framework described above. The result of the inference procedure can then be compared to the input PDFs in order to assess the effectiveness of the methodology.

Following the prescription in Section 2, we associate to each PDF a GP with zero mean and Gibbs kernel [5]

[TABLE]

where

[TABLE]

$\sigma$ and $\ell_{0}$ are hyperparameters and $\varepsilon>0$ regularizes the singularity at $x=0$ .

Physical Constraints.

Since the functions $T_{i}$ do not satisfy any particular constraint, we associate a GP to each of them

[TABLE]

On the other hand, the valence PDFs need to satisfy the constraints dictated by the valence sum rules:

[TABLE]

In order to implement these constraints, we associate a GP to the indefinite integrals of $V_{i}$ , denoted as $\tilde{V}_{i}$ , so that

[TABLE]

where $\kappa(x,x^{\prime})=\ell(x)k(x,x^{\prime})\ell(x^{\prime})$ . The sum rules can be expressed as linear constraints,

[TABLE]

The momentum sum rule,

[TABLE]

is implemented by defining

[TABLE]

and requiring

[TABLE]

In choosing this particular parametrization, we have introduced two hyperparameters $a_{\Sigma}$ and $a_{g}$ that control the asymptotic behaviour of $x\Sigma(x)$ and $xg(x)$ at small $x$ . Since the PDFs vanish at $x=1$ , we have one final set of constraints

[TABLE]

Workflow.

The full workflow for the closure test is as follows.

•

The input PDFs are generated by sampling the prior distribution for the hyperparameters, and then sampling the GPs.

•

The artificial data are obtained from Eq. 16, using a fixed set of FK tables. Each data point is assigned an independent error equal to 10% of the range of the data points. In this particular examples the FK tables are such that points with $x<10^{-4}$ do not contribute to the observables.

•

The hyperparameters are fitted by computing the maximum of their posterior distribution.

•

Using the fitted value for the hyperparameters, we compute the posterior mean and covariance of the GP for each PDF. These posterior distributions – characterized by their central values and covariances – are what we call ‘fitted PDFs’ in this framework.

•

The result of the Bayesian inference for the data points is computed using the fitted PDFs and the FK tables.

Results.

The results of this closure test are summarised in Fig. 1. The plots show a generic agreement between the input PDFs, represented by the coloured lines, and the results of the Bayesian inference, which are reported in the figure by drawing the $\pm 1\sigma$ interval at each point. As expected, the points for $x<10^{-4}$ are less constrained by the data since they do not influence the data points. The data points themselves are well reproduced by the posterior of the GP, as shown in the bottom right plot. Some discrepancies in the PDFs are visible, e.g., for $T_{3}$ . A more quantitative analysis is postponed to future works.

5 Conclusions

In these proceedings, we have sketched the application of GPs for the solution of inverse problems like the ones that appear in the determination of PDFs from data. Our methodology can be applied to both the cases of lattice data and experimental data. It provides a mathematically robust tool to reconstruct the PDFs while taking into account statistical errors and prior knowledge in a robust mathematical formulation. A detailed study of the applicability of this method, and its potential sources of bias, requires a detailed, systematic investigation.

The fact that an inifinitely-wide neural network behaves like a GP suggests that there is rigorous connection between the methodology presented here and in Ref. [4] and the fits based on neural network parametrizations that have been in use for many decades. These features are under investigations and we hope to report more results soon.

Bibliography5

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] X. Ji, Phys. Rev. Lett. 110 (2013), 262002 doi:10.1103/Phys Rev Lett.110.262002 [ar Xiv:1305.1539 [hep-ph]].
2[2] R. D. Ball et al. [NNPDF], Eur. Phys. J. C 82 (2022) no.5, 428 doi:10.1140/epjc/s 10052-022-10328-7 [ar Xiv:2109.02653 [hep-ph]].
3[3] L. Del Debbio, T. Giani and C. J. Monahan, JHEP 09 (2020), 021 doi:10.1007/JHEP 09(2020)021 [ar Xiv:2007.02131 [hep-lat]].
4[4] L. Del Debbio, T. Giani and M. Wilson, Eur. Phys. J. C 82 (2022) no.4, 330 doi:10.1140/epjc/s 10052-022-10297-x [ar Xiv:2111.05787 [hep-ph]].
5[5] C.K. Williams and C.E. Rasmussen, "Gaussian Processes for Machine Learning , Cambridge MA: MIT Press, 2006.