AI-coupled HPC Workflows
Shantenu Jha, Vincent R. Pascuzzi, Matteo Turilli

TL;DR
This paper explores the integration of AI/ML models into HPC workflows, highlighting various modes, challenges, solutions, and the significant scientific advancements enabled by their convergence.
Contribution
It provides a comprehensive overview of AI-coupled HPC workflows, including modes of integration, real-world use cases, challenges, and middleware solutions, emphasizing their impact on scientific discovery.
Findings
AI/ML integration reduces computational needs in HPC workflows.
Coupling AI/ML with HPC enhances scientific performance and enables new explorations.
Frameworks and middleware address challenges like heterogeneity and performance in AI-coupled HPC.
Abstract
Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows have become the ``new applications,'' wherein multi-scale computing campaigns comprise multiple and heterogeneous executable tasks. In particular, the introduction of AI/ML models into the traditional HPC workflows has been an enabler of highly accurate modeling, typically reducing computational needs compared to traditional methods. This chapter discusses various modes of integrating AI/ML models to HPC computations, resulting in diverse types of AI-coupled HPC workflows. The increasing need of coupling AI/ML and HPC across scientific domains is motivated, and then exemplified by a number of production-grade use cases for each mode. We additionally discuss the primary challenges of extreme-scale AI-coupled HPC campaigns -- task heterogeneity, adaptivity, performance -- and several framework and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems
