Distributed Machine Learning for Computational Engineering using MPI

Kailai Xu; Weiqiang Zhu; Eric Darve

arXiv:2011.01349·cs.DC·November 25, 2020·5 cites

Distributed Machine Learning for Computational Engineering using MPI

Kailai Xu, Weiqiang Zhu, Eric Darve

PDF

Open Access 2 Repos

TL;DR

This paper introduces a parallel computing framework combining neural network training with PDE solvers, enabling efficient large-scale simulations by parallelizing both components and separating data communication from computation.

Contribution

It presents a novel framework that parallelizes both neural networks and PDE solvers, improving flexibility and scalability in computational engineering tasks.

Findings

01

Achieved substantial acceleration in training coupled neural networks and PDEs.

02

Demonstrated effectiveness on various large-scale problems.

03

Separated data communication from computation for better modularity.

Abstract

We propose a framework for training neural networks that are coupled with partial differential equations (PDEs) in a parallel computing environment. Unlike most distributed computing frameworks for deep neural networks, our focus is to parallelize both numerical solvers and deep neural networks in forward and adjoint computations. Our parallel computing model views data communication as a node in the computational graph for numerical simulations. The advantage of our model is that data communication and computing are cleanly separated and thus provide better flexibility, modularity, and testability. We demonstrate using various large-scale problems that we can achieve substantial acceleration by using parallel solvers for PDEs in training deep neural networks that are coupled with PDEs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReservoir Engineering and Simulation Methods · Model Reduction and Neural Networks · Neural Networks and Applications