Experiences Readying Applications for Exascale
Paul T. Bauman, Reuben D. Budiardja, Dmytro Bykov, Noel Chalmers,, Jacqueline Chen, Nicholas Curtis, Marc Day, Markus Eisenbach, Lucas Esclapez,, Alessandro Fanfarillo, William Freitag, Nicholas Frontiere, Antigoni, Georgiadou, Joseph Glenski, Kalyana Gottiparthi

TL;DR
This paper reviews four years of experience preparing scientific applications for exascale supercomputers, focusing on programmability, tuning, portability, and community dissemination to ensure readiness for future systems.
Contribution
It provides practical insights and best practices for application readiness on exascale systems, based on case studies and community engagement over four years.
Findings
Early access systems aid development across hardware generations
Best practices improve application portability and performance
Community training accelerates exascale readiness
Abstract
The advent of exascale computing invites an assessment of existing best practices for developing application readiness on the world's largest supercomputers. This work details observations from the last four years in preparing scientific applications to run on the Oak Ridge Leadership Computing Facility's (OLCF) Frontier system. This paper addresses a range of topics in software including programmability, tuning, and portability considerations that are key to moving applications from existing systems to future installations. A set of representative workloads provides case studies for general system and software testing. We evaluate the use of early access systems for development across several generations of hardware. Finally, we discuss how best practices were identified and disseminated to the community through a wide range of activities including user-guides and trainings. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Cloud Computing and Resource Management
