Efficient loading of reduced data ensembles produced at ORNL SNS/HFIR neutron time-of-flight facilities
William F Godoy, Andrei T Savici, Steven E Hahn, Peter F Peterson

TL;DR
This paper introduces algorithmic improvements for loading reduced neutron scattering data ensembles at ORNL, significantly enhancing efficiency and scalability in data management for large experimental datasets.
Contribution
It presents scalable search and extraction algorithms using an in-memory binary-tree index, optimizing data loading in Mantid for neutron scattering experiments.
Findings
Average speed-up of 19-23% in data loading times
Effective handling of large, complex datasets
Improved scalability for growing data volumes
Abstract
We present algorithmic improvements to the loading operations of certain reduced data ensembles produced from neutron scattering experiments at Oak Ridge National Laboratory (ORNL) facilities. Ensembles from multiple measurements are required to cover a wide range of the phase space of a sample material of interest. They are stored using the standard NeXus schema on individual HDF5 files. This makes it a scalability challenge, as the number of experiments stored increases in a single ensemble file. The present work follows up on our previous efforts on data management algorithms, to address identified input output (I/O) bottlenecks in Mantid, an open-source data analysis framework used across several neutron science facilities around the world. We reuse an in-memory binary-tree metadata index that resembles data access patterns, to provide a scalable search and extraction mechanism. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNuclear Physics and Applications · Radiation Detection and Scintillator Technologies · Scientific Computing and Data Management
