Scalable Delivery of Scalable Libraries and Tools: How ECP Delivered a Software Ecosystem for Exascale and Beyond
Michael A. Heroux

TL;DR
The paper describes how the Exascale Computing Project (ECP) successfully managed and delivered a large ecosystem of scientific libraries and tools for exascale computing through effective organizational, management, and quality assurance strategies.
Contribution
It presents the organizational, management, and quality assurance approaches that enabled the scalable and efficient delivery of a large scientific software ecosystem for exascale computing.
Findings
Successful large-scale open-source software development for exascale.
Implementation of project management and quality assurance practices.
Lessons learned for future scientific software projects.
Abstract
The Exascale Computing Project (ECP) was one of the largest open-source scientific software development projects ever. It supported approximately 1,000 staff from US Department of Energy laboratories, and university and industry partners. About 250 staff contributed to 70 scientific libraries and tools to support applications on multiple exascale computing systems that were also under development. Funded as a construction project, ECP adopted an earned-value management system, based on milestones. and a key performance parameter system based, in part, on integrations. With accelerated delivery schedules and significant project risk, we also emphasized software quality using community policies, automated testing, and continuous integration. Software Development Kit teams provided cross-team collaboration. Products were delivered via E4S, a curated portfolio of libraries and tools. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
