Employing Artificial Intelligence to Steer Exascale Workflows with Colmena
Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji,, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster

TL;DR
This paper introduces Colmena, an AI-driven framework that enhances exascale scientific workflows by enabling adaptive, agent-based control to improve resource utilization and reduce communication overhead on supercomputers.
Contribution
We designed Colmena to integrate AI with workflow management, addressing exascale challenges and enabling adaptive, efficient scientific computations across multiple domains.
Findings
Improved node utilization through AI-driven steering strategies
Reduced communication overhead with data fabrics
Enhanced scientific workflows in chemistry, biophysics, and materials science
Abstract
Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities. We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how their application should respond to events (e.g., task completion) as a series of cooperative agents. In this paper, we describe the design of Colmena, the challenges we overcame while deploying applications on exascale systems, and the science workflows we have enhanced through interweaving AI. The scaling challenges we discuss include developing steering strategies that maximize node utilization, introducing data fabrics that reduce communication overhead of data-intensive tasks, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
