A Matrix-Free Galerkin Multigrid Solver and Failure-Mode Screen for Single-GPU 3D SIMP Linear Systems

Shaoliang Yang; Jun Wang; Yunsheng Wang

arXiv:2604.26441·cs.CE·April 30, 2026

A Matrix-Free Galerkin Multigrid Solver and Failure-Mode Screen for Single-GPU 3D SIMP Linear Systems

Shaoliang Yang, Jun Wang, Yunsheng Wang

PDF

TL;DR

This paper presents a matrix-free geometric multigrid solver for large 3D SIMP linear systems on a single GPU, including a diagnostic failure-mode screening method and mixed-precision evaluation.

Contribution

It introduces a novel matrix-free Galerkin multigrid hierarchy, a failure-mode screening tool, and assesses mixed-precision variants for efficient GPU-based 3D SIMP solves.

Findings

01

Pass rates vary with problem size and iteration cap, with higher success at smaller sizes.

02

FP32-GMG outperforms BF16-GMG in wall-time ratios and convergence.

03

Largest reported solve handles 1 million elements in under 2 seconds.

Abstract

Large 3D SIMP studies require repeated elasticity solves for density-dependent operators whose finest matrices are expensive to assemble and whose conditioning degrades under high contrast. We study this linear-solver layer rather than claiming end-to-end optimization acceleration. The solver builds a matrix-free Galerkin geometric multigrid (GMG) hierarchy around a fused fine operator: the finest level remains matrix-free, the first coarse level is assembled by local Galerkin aggregation, and deeper levels use sparse Galerkin products. The practical default is FP32-GMG; BF16 is evaluated as a guarded mixed-precision variant and diagnostic stress test, not as the main speed mechanism. In a 27-case heterogeneous cantilever sweep, pass rates under a 200-iteration budget are 7/9, 4/9, and 1/9 at 64k, 216k, and 512k elements; converged-only mean iteration counts are about 112, 134, and 146.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.