Mass Matrix Assembly on Tensor Cores for Implicit Particle-In-Cell Methods

Luca Pennati; Stefano Markidis

arXiv:2604.19286·cs.CE·April 22, 2026

Mass Matrix Assembly on Tensor Cores for Implicit Particle-In-Cell Methods

Luca Pennati, Stefano Markidis

PDF

TL;DR

This paper reformulates mass matrix assembly in implicit particle methods as matrix multiplications suitable for tensor cores, achieving significant performance improvements on modern hardware.

Contribution

It introduces a general reformulation of mass matrix assembly as matrix products for tensor cores, applicable to various interpolation orders and platforms.

Findings

01

Up to 3x faster than optimized conventional implementations.

02

Reduced ECSIM runtime by 15%.

03

Applicable to scalar and tensorial mass matrices in particle-in-cell methods.

Abstract

Matrix-multiply-accumulate (MMA) units, or tensor cores, are now widespread across modern computing architectures. Yet, their use for particle-grid operators remains limited. In implicit particle methods, mass-matrix assembly is a reduction-dominated kernel in which weighted outer products of interpolation weights are accumulated over particle support. We show that this operation can be reformulated exactly, cell by cell, as a sequence of matrix products matched to hardware MMA tiles. The formulation is general with respect to interpolation order and hardware platform, and applies to both scalar mass matrices and the tensorial block mass matrix arising in implicit in the Energy-Conserving Semi-Implicit Method (ECSIM) for Particle-in-Cell simulations. We introduce particle batching and a support-group decomposition for higher-order shape functions whose stencil extends beyond a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.