Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL
William F Godoy, Peter F Peterson, Steven E Hahn, Jay J Billings

TL;DR
This paper presents new data management algorithms that significantly improve the efficiency of neutron scattering data reduction workflows at ORNL by reducing I/O bottlenecks and optimizing metadata handling.
Contribution
The work introduces an in-memory binary-tree metadata index and redesigned data encapsulation in Mantid to enhance data processing speed and scalability.
Findings
Achieved 11-30% speedup in data reduction workflows
Reduced metadata I/O reconstruction time
Demonstrated scalability with instrument-specific data
Abstract
Oak Ridge National Laboratory (ORNL) experimental neutron science facilities produce 1.2\,TB a day of raw event-based data that is stored using the standard metadata-rich NeXus schema built on top of the HDF5 file format. Performance of several data reduction workflows is largely determined by the amount of time spent on the loading and processing algorithms in Mantid, an open-source data analysis framework used across several neutron sciences facilities around the world. The present work introduces new data management algorithms to address identified input output (I/O) bottlenecks on Mantid. First, we introduce an in-memory binary-tree metadata index that resemble NeXus data access patterns to provide a scalable search and extraction mechanism. Second, data encapsulation in Mantid algorithms is optimally redesigned to reduce the total compute and memory runtime footprint associated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
