Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

Jiqun Tu; Ian Karlin; John Camier; Veselin Dobrev; Tzanio Kolev; Stefan Henneking; Omar Ghattas

arXiv:2603.09038·cs.DC·April 13, 2026

Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores

Jiqun Tu, Ian Karlin, John Camier, Veselin Dobrev, Tzanio Kolev, Stefan Henneking, Omar Ghattas

PDF

TL;DR

This paper demonstrates that FP64 tensor cores on NVIDIA GPUs can significantly accelerate high-order finite element simulations, achieving up to 2× speedups and 83% energy efficiency improvements at exascale levels.

Contribution

First direct programming of FP64 tensor cores for large-scale finite element applications, with kernel fusion optimizations leading to substantial performance and energy efficiency gains.

Findings

01

Achieved up to 2× performance speedup in key kernels.

02

Realized up to 83% energy efficiency improvements.

03

Demonstrated near-perfect weak and 90% strong scaling efficiency across 10,000 GPUs.

Abstract

Finite element simulations play a critical role in a wide range of applications, from automotive design to tsunami modeling and computational electromagnetics. Performing these simulations efficiently at the high resolutions needed for practical applications and scientific insights necessitates the use of high-order methods and large-scale supercomputing. While much progress has been made in porting finite element codes to GPU systems in recent years, additional improvements in the efficiency and computational speed of GPU-accelerated high-order finite element simulations are in constant demand. In this paper, we demonstrate that the FP64 tensor cores on NVIDIA GPUs can be used to further accelerate such simulations, achieving significant speedups in key kernels of MFEM, a scalable open-source finite element library widely used in HPC applications. By integrating FP64 tensor cores with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.