Linking Scientific Instruments and HPC: Patterns, Technologies, Experiences
Rafael Vescovi, Ryan Chard, Nickolaus Saint, Ben Blaiszik, Jim Pruyne,, Tekin Bicer, Alex Lavens, Zhengchun Liu, Michael E. Papka, Suresh Narayanan,, Nicholas Schwarz, Kyle Chard, Ian Foster

TL;DR
This paper reviews patterns and methods for linking scientific instruments with high-performance computing to enable real-time data analysis and processing, sharing practical experiences from five scientific instruments.
Contribution
It introduces common patterns and instantiation methods for configuring distributed computing pipelines connecting instruments and HPC resources, supported by real-world applications.
Findings
Effective data filtering and analysis pipelines were developed.
Application of methods improved data processing efficiency.
Insights into operational implications for scientific facilities.
Abstract
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analyses require methods for configuring and running high-performance distributed computing pipelines--what we call flows--linking instruments, HPC (e.g., for analysis, simulation, AI model training), edge computing (for analysis), data stores, metadata catalogs, and high-speed networks. In this article, we review common patterns associated with such flows and describe methods for instantiating those patterns. We also present experiences with the application of these methods to the processing of data from five different scientific instruments, each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
