Algorithmic patterns for $\mathcal{H}$-matrices on many-core processors
Peter Zaspel

TL;DR
This paper develops and implements a fully GPU-based hierarchical matrix library, enabling efficient matrix operations on many-core processors with significant speedups over traditional CPU-based methods.
Contribution
It introduces novel parallel algorithmic patterns for $\\mathcal{H}$-matrix construction and multiplication tailored for GPU hardware, creating the first entirely GPU-based open-source library.
Findings
Achieves significant speedups compared to CPU-based libraries.
Successfully maps complex $\\mathcal{H}$-matrix algorithms to GPU architecture.
Provides an in-depth performance analysis and validation.
Abstract
In this work, we consider the reformulation of hierarchical () matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs). matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of matrix operations on many-core processors is difficult due to the complex nature of the underlying algorithms. While previous algorithmic advances for many-core hardware focused on accelerating existing matrix CPU implementations by many-core processors, we here aim at totally relying on that processor type. As main contribution, we introduce the necessary parallel algorithmic patterns allowing to map the full matrix construction and the fast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Electromagnetic Scattering and Analysis
