H2Opus: A distributed-memory multi-GPU software package for non-local operators
Stefano Zampini, Wajih Boukaram, George Turkiyyah, Omar Knio, David E., Keyes

TL;DR
H2Opus is a high-performance distributed-memory GPU software package that efficiently handles large-scale hierarchical matrices for non-local operators, enabling scalable solutions for complex integral equations.
Contribution
This paper introduces new distributed-memory GPU algorithms for hierarchical matrix operations, integrated into the H2Opus package, supporting scalable large-scale computations.
Findings
Achieved near-ideal scalability up to 1024 GPUs.
Exceeded 2.3 Tflop/s/GPU in matrix-vector multiplication.
Demonstrated efficient solution of large 16 million degree problems.
Abstract
Hierarchical -matrices are asymptotically optimal representations for the discretizations of non-local operators such as those arising in integral equations or from kernel functions. Their complexity in both memory and operator application makes them particularly suited for large-scale problems. As a result, there is a need for software that provides support for distributed operations on these matrices to allow large-scale problems to be represented. In this paper, we present high-performance, distributed-memory GPU-accelerated algorithms and implementations for matrix-vector multiplication and matrix recompression of hierarchical matrices in the format. The algorithms are a new module of H2Opus, a performance-oriented package that supports a broad variety of -matrix operations on CPUs and GPUs. Performance in the distributed GPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Electromagnetic Scattering and Analysis · Tensor decomposition and applications
