TL;DR
This paper presents an optimized, large-scale FMM-accelerated boundary integral equation solver for wave scattering, demonstrating high efficiency and scalability on exascale HPC architectures with significant performance gains.
Contribution
It introduces a highly optimized, architecture-aware FMM-based solver for wave scattering that scales efficiently on exascale supercomputers, utilizing advanced parallelism and SIMD optimizations.
Findings
Achieves 77% of peak performance on Skylake processors.
Demonstrates near-linear weak scalability up to 6,144 nodes.
Computes over 2 billion degrees of freedom on Cray XC40.
Abstract
Algorithmic and architecture-oriented optimizations are essential for achieving performance worthy of anticipated energy-austere exascale systems. In this paper, we present an extreme scale FMM-accelerated boundary integral equation solver for wave scattering, which uses FMM as a matrix-vector multiplication inside the GMRES iterative method. Our FMM Helmholtz kernels treat nontrivial singular and near-field integration points. We implement highly optimized kernels for both shared and distributed memory, targeting emerging Intel extreme performance HPC architectures. We extract the potential thread- and data-level parallelism of the key Helmholtz kernels of FMM. Our application code is well optimized to exploit the AVX-512 SIMD units of Intel Skylake and Knights Landing architectures. We provide different performance models for tuning the task-based tree traversal implementation of FMM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
