A Simple Communication Scheme for Distributed Fast Multipole Methods
Srinath Kailasa

TL;DR
This paper introduces a simple hierarchical communication scheme for distributed Fast Multipole Methods that scales efficiently on large supercomputers by leveraging MPI neighborhood collectives and uniform trees.
Contribution
The authors propose a minimal-redesign communication approach for distributed FMMs that maintains shared memory optimizations and scales to very large problem sizes.
Findings
Achieved weak-scaling up to 3.2e10 points on 512 nodes.
Demonstrated scalability on the ARCHER2 supercomputer.
Simplified approach results in practical runtimes despite worse asymptotic scaling for non-uniform data.
Abstract
We present a simple hierarchical communication scheme for distributed Fast Multipole Methods (FMMs) based on MPI neighborhood collectives and uniform trees. The method targets the common case of extending an existing high-performance shared-memory uniform-tree FMM implementation to distributed memory with minimal redesign while preserving any shared memory optimizations. Benchmarks on the ARCHER2 supercomputer demonstrate that our method can scale to very large problem sizes, we demonstrate weak-scaling up to 3.2e10 uniformly distributed points on 512 nodes of the machine in our largest runs. Our simplifications based on uniform trees result in worse asymptotic scaling for non-uniform points, however we still obtain practically useful runtimes due to the ability to retain our shared memory optimizations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
