Amplitude analysis of four-body decays using a massively-parallel fitting framework
C. Hasse, J. Albrecht, A. A. Alves Jr., P. d'Argent, T. D. Evans, J., Rademacker, M. D. Sokoloff

TL;DR
This paper introduces an extension to the GooFit framework enabling efficient time-dependent amplitude analysis of four-body decays on GPUs, demonstrating significant performance gains and sensitivity to D0-D0bar mixing parameters.
Contribution
The paper presents a novel GPU-accelerated extension to GooFit for four-body decay amplitude analysis, improving computational speed and sensitivity in mixing parameter measurements.
Findings
GPU implementation significantly outperforms CPU in analysis speed
Sensitivity to D0-D0bar mixing parameters achieved at 0.019%
Extension enables detailed studies of four-body decay dynamics
Abstract
The GooFit Framework is designed to perform maximum-likelihood fits for arbitrary functions on various parallel back ends, for example a GPU. We present an extension to GooFit which adds the functionality to perform time-dependent amplitude analyses of pseudoscalar mesons decaying into four pseudoscalar final states. Benchmarks of this functionality show a significant performance increase when utilizing a GPU compared to a CPU. Furthermore, this extension is employed to study the sensitivity on the mixing parameters and in a time-dependent amplitude analysis of the decay . Studying a sample of 50 000 events and setting the central values to the world average of and , the statistical sensitivities of and are determined to be and $\sigma(y) = 0.019…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6| Events | Sensitivity of | Sensitivity of |
|---|---|---|
| 20000 | ||
| 50000 | ||
| 70000 |
| Events | 2 Intel Xeon | NVIDIA | ||
|---|---|---|---|---|
| E5-2680 v3 2.50GHz | GT 525M | K40 | ||
| 24 Cores | 48 Cores | 96 Cores | 2880 Cores | |
| Points | 2 Intel Xeon | NVIDIA | ||
|---|---|---|---|---|
| E5-2680 v3 2.50GHz | GT 525M | K40 | ||
| 24 Core | 48 Cores | 96 Cores | 2880 Cores | |
| - | ||||
| - | ||||
| - | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Amplitude analysis of four-body decays using a massively-parallel fitting framework
C Hasse1,2,3
J Albrecht2
A A Alves Jr.3
P d’Argent4
T D Evans5
J Rademacker6 and M D Sokoloff3
1 CERN, CH-1211 Geneva 23, Switzerland
2 Experimentelle Physik V, Technische Universität Dortmund, Otto-Hahn-Straße 4, 44227 Dortmund, Germany
3 Physics Department, University of Cincinnati, 2600 Clifton Ave. Cincinnati, OH 45221, USA
4 Physikalisches Institut, Universität Heidelberg, Im Neuenheimer Feld 226, 69120 Heidelberg, Germany
5 Department of Physics, University of Oxford, Parks Road, Oxford OX1 3PU, UK
6 School of Physics, University of Bristol, Tyndall Avenue, Bristol BS8 1TL, UK [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] and [email protected]
Abstract
The GooFit Framework is designed to perform maximum-likelihood fits for arbitrary functions on various parallel back ends, for example a GPU. We present an extension to GooFit which adds the functionality to perform time-dependent amplitude analyses of pseudoscalar mesons decaying into four pseudoscalar final states. Benchmarks of this functionality show a significant performance increase when utilizing a GPU compared to a CPU. Furthermore, this extension is employed to study the sensitivity on the mixing parameters and in a time-dependent amplitude analysis of the decay \PDzero\rightarrow\PKp\Ppiminus\Ppiplus\Ppiminus\. Studying a sample of events and setting the central values to the world average of 0.49\pm 0.15\text{,}\mathrm{\char 37\relax}\text{/} and $y=$0.61\pm 0.08\text{\,}\mathrm{\char 37\relax}\text{/}, the statistical sensitivities of and are determined to be 0.019\text{,}\mathrm{\char 37\relax}\text{/} and $\sigma(y)=$0.019\text{\,}\mathrm{\char 37\relax}\text{/}.
1 Introduction
In physics analyses it is common to fit a theoretical model to observed data to extract parameters of interest. This involves minimizing the differences between a model and data, which is mostly done by performing a minimization of a cost function, for example the likelihood function. However, problems arise because the computations become very expensive as the complexity of the models and number of events increases. The GooFit [1, 2, 3] framework has been designed to address this issue by allowing such computations to be performed in parallel. It is built upon the Thrust library [4] to be able to run on different parallel architectures, while maintaining a control flow similar to the RooFit package [5], which is commonly used in high energy physics to fit theoretical models to data, and which only runs on CPUs. While GooFit has been successfully employed in several analyses, even for complex models such as time-dependent mixing in three-body decays, it did not allow for performing a time-dependent amplitude analyses of four-body decays. This functionality was recently added and will be described in this paper.
2 Mixing in the decay \PDzero\rightarrow\PKp\Ppiminus\Ppiplus\Ppiminus\
Mixing or oscillation of neutral mesons is a process during which a particle transitions into its antiparticle or vice versa. This process has been observed in the \PKzero, \PBzero, \PBsand \PDzerosystems. The \PDzerosystem is the only one comprised of up-type quarks.
One possible decay to study the phenomenon of mixing in the neutral charm meson system is the decay of \PDzeroto . This decay can proceed via two different decay amplitudes, which are depicted in figure 1. The top arrow depicts the direct decay subscribed , while the bottom arrow represents the decay proceeding via mixing into a \APDzerowhich decays into the final state via an amplitude subscribed . Due to the mixing of a \PDzerointo a \APDzerobeing time-dependent, the overall decay rate becomes time-dependent. Analysing such time-dependent decay rates allows extraction of mixing properties of the \PDzerosystem.The expression for the time-dependent decay rate of the \PDzero, assuming no CP violation, can be derived to be [6],
[TABLE]
Most of the complexity of this expression lies within the model used to describe the two amplitudes and
3 Structure and implementation of four-body amplitudes
While equation 1 is completely general, the amplitudes that encode the properties of the decay are functions of the position in phase space occupied of the final state of the decay. The amplitude structure of a four-body decay is significantly more complicated than that of three-body decays because their phase space is five dimensional while three-body decays merely occupy a two-dimensional phase space.
Similar to other amplitude models, the implemented functionality assumes that multi-body decays mostly proceed via quasi two-body processes, which include two-body resonances.
This leads to two possible decay chain topologies depicted in figure 2, where and are intermediate resonances and and are the four final decay products, in various configurations. Here, and can take the form of multiple kinematically allowed resonance states, resulting in many possible decay chains. A complete amplitude will therefore be modelled by a coherent sum over these decay chains as,
[TABLE]
Each decay chain is constructed by the user from classes representing form factors, spin factors, resonance lineshapes, and possibly, in the case of two identical final state particles, Bose-symmetrization. After successfully constructing all necessary decay chains the user constructs two amplitude class instances representing and , which each hold the necessary decay chains to fit the theoretical model. The model creation is finalized by creating an instance of the time-dependent amplitude model class and passing the two amplitudes just created by the user. Upon creation the time-dependent model class automatically checks for recurring form factors, spin factors, and lineshapes in all decay chains. In case of multiple occurrences, these instances are substituted by a link to a single instance, thus removing redundant calculations. The proceeding steps of the internal model building process are explained in detail in [1, 2].
3.1 Normalization and event generation
During the fitting procedure the complete expression in equation 1 must be normalized accurately. As it is not feasible to find an analytic expression for such a complex function, the normalization is computed numerically. In our study, this requires evaluating the function at several million phase space points. To achieve a sufficiently fast generation of phase space events, we integrated the MCBooster library [7, 8], which allows very fast generation of phase space events on the GPU. This also enables the generation of pseudo-events, which are uniformly distributed phase space events weighted by the previously created amplitude model.
3.2 Validation
As this work implemented various new building blocks to model four-body decay amplitudes in GooFit it was important to validate the correctness of each of these new components. A cross check of the implementation was performed by comparing the newly implemented functionality of GooFit to the software package MINT3 [9]. MINT3 is based upon the MINT (Minuit Interface) package [10], which is used to perform time-integrated amplitude analyses of three- and four-body decays. Additionally, it supports the generation of pseudo-events. We generate pseudo-events for a specific amplitude model, which includes all newly implemented building blocks, and compare the resulting event samples. This comparison is performed by studying the phase space projections of the samples given the five variables and , where the subscript 12 refers to the \Ppiplus\Ppiminuspair and 34 to the \PKp\Ppiminuspair.
As shown in figure 3, there are no significant differences observed and the pull distribution as well as the p-value indicate that both samples are drawn from the same distribution.
4 Statistical sensitivity to the charm mixing parameters and
The novel functionality of GooFit has successfully been used to determine the statistical sensitivity on the charm mixing parameters and in a time-dependent amplitude analysis of the decay \PDzero\rightarrow\PKp\Ppiminus\Ppiplus\Ppiminus\. This study did not account for resolution effects, background in the data, and did not allow the model to float. Therefore, the real sensitivity will be worse than shown in table 1. Nevertheless, this study proves the capabilities of the newly implemented extension in GooFit to be fully functional.
5 Performance comparison between CPU and GPU
Lastly, we present a performance comparison of the newly implemented functionality, between the CPU and GPU. Two different test cases are used to study the performance. The first one targets the generation speed of pseudo-events according to a time-dependent amplitude-model. This generation is repeated for three different sample sizes to study the scaling behavior. Secondly, the performance of the fitting procedure is studied, where the scaling behavior is studied by increasing the number of used events in the normalization while leaving the sample size one fits to constant.
These tests are repeated on three different platforms: a server with two Intel Xeon E5-2680 v3 CPUs, each with 12 physical cores that can run two concurrent threads, a NVIDIA K40 GPU and a mid-range mobile gaming GPU NVIDIA GeForce GT 525M. The results are obtained by an average over 5 runs, and listed in tables 2 and 3. They show a significant speedup when utilizing the K40 and even the outdated mid-range mobile graphics card was able to perform surprisingly well compared to the other two platforms, but due to insufficient memory it was not able to complete all tests.
While the non-linear scaling from 24 to 48 cores was expected as one only increases the logical number of cores by running two threads per core, the expected performance gain from the K40 compared to the GT 525M was less than a priori expected. Using the available NVIDIA profiler, we are able to determine that the source of the throttled performance on the K40 is due to memory latency. We hope to reduce this in the future by reducing the used memory as well adapting the current memory layout to make memory transfers more efficient.
6 Summary
In conclusion, we have presented a novel extension to the GooFit framework which allows for performing a time-dependent amplitude analysis of a pseudoscalar meson decaying into four pseudo-scalar final states. Additionally, this extension allows the user to generate pseudo-events according to a previously defined time-dependent amplitude model. This functionality was successfully validated by comparing the results to an existing software package and furthermore used to study the sensitivity to the charm mixing parameters in the decay \PDzero\rightarrow\PKp\Ppiminus\Ppiplus\Ppiminus\. Lastly, it is shown that there is a significant speedup gained by utilizing the GPU, while an even bigger performance gain is forseen once the memory layout in GooFit has been adapted to minimize memory latency on high performance GPUs like the K40.
The GooFit package can be found on GitHub at https://github.com/GooFit \ackI would like to thank the authors and maintainers of the MINT and MINT3 framework, P. d’Argent, T.D. Evans and J. Rademacker, as their work and support has been most helpful in implementing the presented extension to GooFit.
Work sponsored by the Wolfgang Gentner Programme of the Federal Ministry of Education and Research.
The development of this extension has been in part supported by the National Science Foundation under grant number PHY-1414736.
NVidia provided K40 GPUs for our use through its University Partnership program.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andreassen R, Meadows B T, de Silva M, Sokoloff M D and Tomko K 2014 Journal of Physics: Conference Series 513 052003
- 2[2] Andreassen R E, de Silva W M, Meadows B T, Sokoloff M D and Tomko K A 2014 IEEE Access 2 160–176
- 3[3] The Goo Fit package https://github.com/Goo Fit/Goo Fit
- 4[4] The Thrust library https://thrust.github.io/
- 5[5] Verkerke W and Kirkby D P 2003 e Conf C 0303241 ( Preprint physics/0306116 )
- 6[6] Sozzi M S 2008 Discrete symmetries and CP violation: From experiment to theory
- 7[7] The MC Booster library https://github.com/Multithread Corner/MC Booster
- 8[8] Alves Jr A A et al. 2016 MC Booster: a library for fast Monte Carlo generation of phase-space decays in massively parallel platforms. http://indico.cern.ch/event/505613/contributions/2230884/
