GNA: new framework for statistical data analysis
Anna Fatkina, Maxim Gonchar, Anastasia Kalitkina, Liudmila Kolupaeva,, Dmitry Naumov, Dmitry Selivanov, Konstantin Treskov

TL;DR
GNA is a flexible, efficient framework for large-scale physical model fitting using data flow graphs, enabling uncertainty propagation and statistical analysis.
Contribution
It introduces a novel data flow-based framework for fitting and analyzing large-scale physical models with uncertainty handling.
Findings
Supports large parameter sets and complex models
Enables uncertainty propagation with correlations
Provides efficient, lazy evaluation of models
Abstract
We report on the status of GNA --- a new framework for fitting large-scale physical models. GNA utilizes the data flow concept within which a model is represented by a directed acyclic graph. Each node is an operation on an array (matrix multiplication, derivative or cross section calculation, etc). The framework enables the user to create flexible and efficient large-scale lazily evaluated models, handle large numbers of parameters, propagate parameters' uncertainties while taking into account possible correlations between them, fit models, and perform statistical analysis. The main goal of the paper is to give an overview of the main concepts and methods as well as reasons behind their design. Detailed technical information is to be published in further works.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The GNA framework provides a library of transformations required for building models and performing statistical analysis of data. Some of the included transformations are basic operations (sum, product), linear algebra (Cholesky decomposition), statistics (, covariance matrix and Poisson log-likelihood calculation), calculus (differentiation and integration), physics (neutrino oscillations probability, inverse beta decay cross section), and detector effects (energy smearing, energy response distortion). There is a focus on reactor neutrino physics due to the fact that the project has grown from the analysis of Daya Bay experimental data.
Even after the transformations are introduced, combining them may be quite a tedious job. To make it easier we introduce bundles and expressions. A \Bundleis a Python class that reads a dictionary containing the configuration and initializes a set of variables and instantiates and binds a set of transformations. The result of \Bundleexecution is a small computational chain.
A special tool is provided for the purpose of binding small computational graphs together to form a large chain. The concept called \Expressionis using mathematical expressions to describe connections between transformations and bundles. \Expressionis initialized with the following information: a) a mathematical expression, b) the definition of indices used in the expression, c) configurations for the bundles to be executed to provide expression elements.
The expressions are parsed within a predefined python environment and are valid python expressions. The only distinction is that we are using the ’func| arg1, arg2, ...’ notation for ’func(arg1, arg2, ...)’ in order to improve the readability of the chain calls. Consider the following example: {linenomath}
[TABLE]
The expressions are parsed without actual knowledge of what data and functions are: each name is considered to be an index, variable or transformation output. Here vecj is a transformation output providing an array. The bundle that will provide vec will provide an output for each variant of index j. scale[k] is a variable with variants for k1, k2, etc. The multiplication scale[i] * vecj will provide a transformation that scales an array for each combination of indices of i and j. Then the offset array will be added to each result.
func is a function with one argument and one return value. The call operation means that the output, associated with the argument, is bound to the input, associated with a function. As in the case of multiplication each permutation of indices is taken into account. Since there is an output for each combination of i and j values and functions have several variants, represented by index k, a call will produce an output for each permutation of i, j and k variants. Lastly the sum[k] will make a sum over index k and provide the output for each of i and j variants. Then, for each unknown name the \Expressionfinds a bundle configuration and executes the corresponding bundle, passing the indices assigned to the name. The bundle should then provide all the necessary inputs, outputs and variables. Once bundles are executed \Expressionwill bind the transformations together.
The result of the expression \eqreflst:expression for the case when index i has variants , and is shown in figure 1. The graph represents only transformations and their connections while variables are not plotted.
The \Expressionmodule provides operators for addition, multiplication and division; summation, multiplication and concatenation over indices. These few elements, together with a set of bundles, is enough to build large-scale models. For example, the Daya Bay model is described by 6 indices, roughly 50 items in the expression, 25 configuration items that are using 16 bundles. The \Expressionproduces a computational chain of 732 nodes and 1624 edges (736 outputs with 13.6 Mb of data); 498 parameters (114 fixed, 97 variable and 287 evaluated). The full computational time is 10 ms. Only a slight modification of the order of instructions in the expression yield a very different computational graph with 2412 nodes and 5968 edges (4416 outputs with 20.3 Mb of data). The latter chain produces identical result with a full computational time of 20 ms. A more detailed description of the example may be found in [Gonchar:1810db].
