Integrating Deep Learning in Domain Sciences at Exascale
Rick Archibald, Edmond Chow, Eduardo D'Azevedo, Jack Dongarra, Markus, Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai, Wong, and Junqi Yin

TL;DR
This paper discusses integrating deep learning with high-performance computing at exascale, addressing challenges, proposing new techniques, and introducing the MagmaDNN framework for efficient HPC AI applications.
Contribution
It introduces MagmaDNN, an open-source HPC deep learning framework that seamlessly integrates with existing HPC libraries and addresses current scalability and efficiency challenges.
Findings
Developed asynchronous parallelization techniques for large-scale systems.
Enhanced deep learning performance with mixed-precision and asynchronous optimization.
Demonstrated applications in materials science, imaging, and climate modeling.
Abstract
This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems and upcoming exascale systems. These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework. Many deep learning frameworks are targeted at data scientists and fall short in providing quality integration into existing HPC workflows. This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced X-ray and CT Imaging · Reservoir Engineering and Simulation Methods
