Application of SNiPER framework to BESIII physics analysis
Xin Xia, Teng Li, Xing-Tao Huang, Xue-Yao Zhang

TL;DR
This paper presents a new SNiPER-based framework for BESIII physics analysis, significantly improving data processing speed through a redesigned event data model and lazy-loading techniques, demonstrated with a real physics analysis.
Contribution
The paper introduces a SNiPER-based analysis framework with a novel reconstructed event data model and SmartRef lazy-loading, enhancing processing efficiency for BESIII data.
Findings
Achieved a 10.3-fold speedup in Input/Output operations.
Demonstrated the framework with a real physics analysis of e+e- to pi+pi-J/psi.
Identified the data model and SmartRef lazy-loading as key to performance gains.
Abstract
A fast physics analysis framework has been developed based on SNiPER to process the increasingly large data sample collected by BESIII. In this framework, a reconstructed event data model with SmartRef is designed to improve the speed of Input/Output operations, and necessary physics analysis tools are migrated from BOSS to SNiPER. A real physics analysis is used to test the new framework, and achieves a factor of 10.3 improvement in Input/Output speed compared to BOSS. Further tests show that the improvement is mainly attributed to the new reconstructed event data model and the lazy-loading functionality provided by SmartRef.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Application of SNiPER framework to BESIII physics analysis††thanks: Supported by Joint Large-Scale Scientific Facility Funds of the NSFC and CAS (U1532258), Program for New Century Excellent Talents in University (NCET-13-0342) and Shandong Natural Science Funds for Distinguished Young Scholar (JQ201402), National Key Basic Research Program of China under Contract (2015CB856700).
Xin Xia ()⋆;1) Teng Li ()⋆;2) Xing-Tao Huang ()⋆;3) Xue-Yao Zhang ()⋆
⋆ Shandong University (SDU), Jinan, Shandong 250100, China
Abstract
A fast physics analysis framework has been developed based on SNiPER to process the increasingly large data sample collected by BESIII. In this framework, a reconstructed event data model with SmartRef is designed to improve the speed of Input/Output operations, and necessary physics analysis tools are migrated from BOSS to SNiPER. A real physics analysis is used to test the new framework, and achieves a factor of 10.3 improvement in Input/Output speed compared to BOSS. Further tests show that the improvement is mainly attributed to the new reconstructed event data model and the lazy-loading functionality provided by SmartRef.
keywords:
BESIII, SNiPER, SmartRef, Software, Event Data Model
pacs:
2
9.85.-c, 02.70.Hm
1 Introduction
The Beijing Spectrometer III (BESIII) [2] is a detector at the Beijing Electron–Positron Collider II (BEPCII). The accelerator has two storage rings with a circumference of 224 m and a crossing angle of 22 mrad. Its designed peak luminosity is 11033 cm*-2s-1* at a beam energy of 1.89 GeV [2]. In April 2016, BEPCII successfully reached this goal. Assuming s data taking time each year, the BESIII detector is able to collect 10 billion , 300 million , 30 million or 2 million . The huge amount of data collected makes it possible to study light hadron spectroscopy in the decay of charmonium states and charmed mesons with unprecedentedly high precision [3]. Since its first collisions in June 2008, BESIII’s data volume has reached 3 PB and is increasing at a speed of about 0.5 PB per year. The BESIII Offline Software System (BOSS) is the main framework that is currently used in BESIII experiment, and its role is very important in the whole offline data processing and physics analysis workflow. Based on Gaudi [4], BOSS provides standard interfaces for common software components which are necessary for data processing and analysis. However, Input/Output (I/O) in BOSS requires data format conversion and full-size data read-in, which uses extra CPU time and imposes restrictions on the processing speed [5]. Consequently, I/O is becoming a bottleneck for BOSS, especially for physics analysis.
The Software for Non-collider Physics Experiments (SNiPER) [6] is a software framework for simulation, reconstruction and analysis in a variety of experiments like the Jiangmen Underground Neutrino Observatory (JUNO) [7] and the Large High Altitude Air Shower Observatory (LHAASO) [8]. SNiPER is a light-weight, flexible framework with an event data management system which is designed to manage any type of event data. Consequently, no additional data format conversion is needed. In order to improve the speed of BESIII physics analysis, SNiPER is being applied to the BESIII experiment. In this article, details are given of the redesign of the reconstructed event data model, the migration of BESIII physics analysis tools into SNiPER, and a comparison of its performance with that of BOSS.
2 Redesign of the Reconstructed Event Data Model
For physics analysis in BOSS, the whole information of each reconstructed event is read in from the Data Summary Tape (DST) files, where data are stored as ROOT trees. The event data are then converted to Gaudi’s data format to be managed by the Gaudi event data store. This conversion requires extra CPU time and memory, and thus slows down the physics analysis. In the physics analysis, the input information of each event is processed with analysis algorithms for further event selection. This kind of full-size information read-in of each event also degrades the speed of analysis.
In SNiPER, the event data management system is designed to use the ROOT format all the way through the whole process. Event information that is read in from the DST files can be handled directly by SNiPER’s event data management (EDM) system, so no conversion process is needed. The I/O system is designed to first read only part of the event information for fast event selection. Once it meets all selection requirements, the whole event information will be read. Therefore, time spent on the data I/O process is significantly reduced, which is similar to the tag-based pre-selection mechanism under the BOSS framework [9].
To use the data management system in SNiPER, a new EDM for the BESIII physics analysis has been designed. This reconstructed event data model consists of two layers, DstEvent and EvtHeader, as shown in Fig. 2. The DstEvent contains the full information of physics events, and keeps their original structure in the old DST files. The EvtHeader is a newly added layer. With the negligible increase in size (in permillage level), the EvtHeader plays a very important role in the framework.
- It is the entrance point for the event data service to access new ROOT files.
- It stores characteristic variables of event, such as the good charged tracks, which can be customized by users to do fast selection without loading the full event information.
- It stores the SmartRef, which is a smart pointer providing references to events [10]. The referenced object in DstEvent will not be loaded from input files until it is actually needed. Therefore, the unnecessary performance overhead can be avoided as a result of the lazy-loading mechanism.
In the analysis algorithm, fast selection is usually applied as soon as the EvtHeader is available. In fast selection, only the variables in EvtHeader are used without loading the full event into memory. After fast selection, the full information of survived events will be requested and loaded into memory by the SmartRef for further analysis. Using this mechanism, significantly less information needs to be read in from disk, which leads to less time consumption in I/O operations. The stricter the selection is, the faster the I/O operations will be. These are the main strategies to improve the speed of analysis in general.
\figcaption
Schema of the reconstructed event data model. The left-hand side is the EDM under BOSS, and the right-hand side is the new EDM under SNiPER.
3 Migration of Analysis Tools
In BESIII physics analysis, several tools are indispensable, including particle identification (ParticleID), vertex fit (VertexFit) and kinematic fit (KinematicFit).
To organize the information left by the final state particles in detectors, a simplified version of the EvtRecTrack class was imported to integrate all the tracks in the sub-detectors into one logical track from the inside, with their corresponding track IDs stored in the TEvtRecTrack. To determine the vertex, the primary vertex information and magnetic field information are needed. So the DatabaseSvc and MagneticField were also migrated. In all these migrated packages, an interface to access the data with the new format was developed. For simplicity, TObject classes are used to substitute the EventObject classes. The final workflow model used in SNiPER can be seen in Fig. 3.
\figcaption
The new workflow for BESIII analysis in SNiPER.
4 Performance and Tests
In order to validate the migration, we ran the real physics analysis of the process at center-of-mass (CM) energy of GeV [11] with both BOSS and SNiPER. In the analysis, the candidate was reconstructed with lepton pairs ( or ), which results in a final state with four charged tracks. Therefore, the number of charged tracks was required to be no less than 4, which makes the proportion of surviving events approximately 1/200. In SNiPER, the number of charged tracks is stored in EvtHeader and defined as a tag for fast selection, so only the selected events are fully loaded into memory for further study, which greatly increases the input speed. In this test, the version of BOSS was , and 400 DST files were randomly selected for analysis. Table 4 shows the number of surviving events after a series of selections under the two frameworks. In SNiPER, the results of event selection were exactly the same as for BOSS.
\tabcaption
Number of surviving events passing cuts in BOSS and SNiPER
NO. Selection BOSS SNiPER
1 Total entries 75159209 75159209
2 Charged tracks 389796 389796
3 Good charged tracks 158595 158595
4 Good photon 21959 21959
5 Particle identification 20483 20483
6 Kinematic fit 3748 3748
7 Save result 401 401
After checking the step-by-step selections, the invariant mass spectrum of selected candidates was compared between BOSS and SNiPER using the whole dataset. Figure 1 shows the distributions of , , and for the signal events. The distributions in BOSS and SNiPER agree with each other very well, which means the analysis code migrated to SNiPER works properly.
\ruleup
\ruledown
To quantify the improvement gained by the new physics analysis framework, we ran a series of tests with the same analysis of at the CM energy of GeV, using the same data sample as the previous test. Under SNiPER, the total number of charged tracks, which is required to be no less than 4, is added to the EvtHeader as a pre-selection variable. 400 input data files were equally divided into 4 groups, and then submitted to the Portable Batch System in queue besq. The time consumptions were measured under the same hardware environment with a CPU, model Intel(R) Xeon(R) CPU E5-2680 v3 at frequency of 2.50 GHz. The analysis using the old EDM and BOSS consumed 170.5 minutes on average, while 44.5 minutes on average were spent with new EDM and SNiPER, which means analysis with the new version is 3.8 times faster than BOSS. To investigate where the speed boost comes from, the time consumption of each section was measured and is listed in Table 4. The proportion of time consumption for each section under the two frameworks can also be seen in Fig. 4.
\tabcaption
Comparison of time consumption for BOSS and SNiPER.
Framework EDM & I/O Analysis
BOSS /min 1.0 135.0 34.6
SNiPER /min 0.18 13.1 31.2
Gains 5.6 10.3 1.1
These tests indicate that SNiPER itself is running 5 times faster than BOSS, but the contribution from the new framework is very small due to its fast execution. With the new EDM with SmartRef, the I/O speed is improved by 10 times, and it decreases the proportion of I/O time from 80% to 30%. The analysis step with SNiPER costs a similar time as BOSS, but the proportion increases significantly due to the improvement of the I/O procedure, which means computing power is concentrated on the real analysis instead of data conversion.
\figcaption
Time consumption proportion under BOSS (left) and SNiPER (right) framework.
5 Conclusion
In this article, a new BESIII physics analysis framework based on SNiPER has been introduced, with SmartRef implemented into the reconstructed event data model for fast event pre-selection. The new framework was tested with a real physics analysis at the CM energy of GeV, yielding exactly the same results as the original BOSS framework. In the test, SNiPER gained 3.8 times improvement in total execution speed, and saved more than 70% of the time for this specified physics channel. More tests showed that this improvement is mainly from the new event data model with SmartRef, which gains 10 times improvement compared to BOSS.
We can conclude that the new physics analysis framework based on SNiPER significantly improves the I/O performance with its redesigned reconstructed event data model using SmartRef. We can gladly say that this framework is ready for physics analysis in BESIII, and the first stable version of SNiPERMT, which is suitable for the concurrent environment, will be released in 2017.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2[2] BESIII Collaboration, Nuclear Instruments and Methods in Physics Research A, 598: 7–11 (Jan. 2009)
- 3[3] D. M. Barnes, T. Bian, J. M. Bigi et al, ar Xiv: 0809.1869
- 4[4] G. Barrand, I. Belyaev, P. Binko et al, Computer Physics Communications 140: 45–55 (Oct. 2001)
- 5[5] W. D. Li, H. M. Liu, Z. Y. Deng et al, in Proceedings of CHEP 06, Mumbai, India, (Computing in High Energy and Nuclear Physics Feb. 2006)
- 6[6] J. H. Zou, X. T. Huang, W. D. Li et al, Journal of Physics: Conference Series 664: 072053 (2015)
- 7[7] F. An, G. An, Q. An et al, Journal of Physics G Nuclear Physics 43: 030401 (Mar. 2016)
- 8[8] G. Di Sciascio on behalf of the LHAASO Collaboration, ar Xiv: 1602.07600
