Distributed Multigrid Neural Solvers on Megavoxel Domains
Aditya Balu, Sergio Botelho, Biswajit Khara, Vinay Rao, Chinmay Hegde,, Soumik Sarkar, Santi Adavani, Adarsh Krishnamurthy, Baskar, Ganapathysubramanian

TL;DR
This paper introduces a scalable distributed training framework for neural PDE solvers on megavoxel domains, combining multigrid-inspired training acceleration with distributed deep learning to efficiently solve high-resolution 3D Poisson equations.
Contribution
It presents a novel multigrid-inspired training method integrated with distributed deep learning for large-scale neural PDE solvers, enabling high-resolution 3D solutions.
Findings
Achieved scalable training on GPU and CPU clusters.
Successfully trained a 3D Poisson solver up to 512x512x512 resolution.
Demonstrated significant reduction in training time and improved scalability.
Abstract
We consider the distributed training of large-scale neural networks that serve as PDE solvers producing full field outputs. We specifically consider neural solvers for the generalized 3D Poisson equation over megavoxel domains. A scalable framework is presented that integrates two distinct advances. First, we accelerate training a large model via a method analogous to the multigrid technique used in numerical linear algebra. Here, the network is trained using a hierarchy of increasing resolution inputs in sequence, analogous to the 'V', 'W', 'F', and 'Half-V' cycles used in multigrid approaches. In conjunction with the multi-grid approach, we implement a distributed deep learning framework which significantly reduces the time to solve. We show the scalability of this approach on both GPU (Azure VMs on Cloud) and CPU clusters (PSC Bridges2). This approach is deployed to train a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
