An implementation of tensor product patch smoothers on GPU

Cu Cui; Paul Grosse-Bley; Guido Kanschat; Robert Strzodka

arXiv:2405.19004·math.NA·May 7, 2025·1 cites

An implementation of tensor product patch smoothers on GPU

Cu Cui, Paul Grosse-Bley, Guido Kanschat, Robert Strzodka

PDF

Open Access

TL;DR

This paper introduces a GPU implementation of tensor product patch smoothers for higher order finite element methods, optimizing memory use and achieving significant speedups over naive approaches.

Contribution

It presents a novel GPU-based implementation that reduces global data transfer and conflict, enabling faster multigrid smoothing for finite element methods in 2D and 3D.

Findings

01

At least 2x speedup over straightforward implementation

02

Achieves up to 36% of peak GPU performance

03

Effective in both single and double precision

Abstract

We present a GPU implementation of vertex-patch smoothers for higher order finite element methods in two and three dimensions. Analysis shows that they are not memory bound with respect to GPU DRAM, but with respect to on-chip scratchpad memory. Multigrid operations are optimized through localization and reorganized local operations in on-chip memory, achieving minimal global data transfer and a conflict free memory access pattern. Performance tests demonstrate that the optimized kernel is at least 2 times faster than the straightforward implementation for the Poisson problem, across various polynomial degrees in 2D and 3D, achieving up to 36% of the peak performance in both single and double precision on Nvidia A100 GPU.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Computational Physics and Python Applications · Distributed and Parallel Computing Systems