Accelerating Time-to-Science by Streaming Detector Data Directly into Perlmutter Compute Nodes
Samuel S. Welborn, Bjoern Enders, Chris Harris, Peter Ercius, Deborah, J. Bard

TL;DR
This paper presents a streaming data workflow that directly transfers high-rate detector data into compute node memory, significantly reducing bottlenecks and accelerating data analysis for time-sensitive scientific experiments.
Contribution
It introduces a novel streaming workflow using ZeroMQ and distributed key-value stores to bypass storage I/O, enabling faster data processing at HPC centers.
Findings
Achieved up to 14-fold increase in data throughput
Improved predictability and reliability over traditional I/O workflows
Integrated system with detector's science gateway and web frontend
Abstract
Recent advancements in detector technology have significantly increased the size and complexity of experimental data, and high-performance computing (HPC) provides a path towards more efficient and timely data processing. However, movement of large data sets from acquisition systems to HPC centers introduces bottlenecks owing to storage I/O at both ends. This manuscript introduces a streaming workflow designed for an high data rate electron detector that streams data directly to compute node memory at the National Energy Research Scientific Computing Center (NERSC), thereby avoiding storage I/O. The new workflow deploys ZeroMQ-based services for data production, aggregation, and distribution for on-the-fly processing, all coordinated through a distributed key-value store. The system is integrated with the detector's science gateway and utilizes the NERSC Superfacility API to initiate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Data Storage Technologies
