A Matrix-Free Galerkin Multigrid Solver and Failure-Mode Screen for Single-GPU 3D SIMP Linear Systems
Shaoliang Yang, Jun Wang, Yunsheng Wang

TL;DR
This paper presents a matrix-free geometric multigrid solver for large 3D SIMP linear systems on a single GPU, including a diagnostic failure-mode screening method and mixed-precision evaluation.
Contribution
It introduces a novel matrix-free Galerkin multigrid hierarchy, a failure-mode screening tool, and assesses mixed-precision variants for efficient GPU-based 3D SIMP solves.
Findings
Pass rates vary with problem size and iteration cap, with higher success at smaller sizes.
FP32-GMG outperforms BF16-GMG in wall-time ratios and convergence.
Largest reported solve handles 1 million elements in under 2 seconds.
Abstract
Large 3D SIMP studies require repeated elasticity solves for density-dependent operators whose finest matrices are expensive to assemble and whose conditioning degrades under high contrast. We study this linear-solver layer rather than claiming end-to-end optimization acceleration. The solver builds a matrix-free Galerkin geometric multigrid (GMG) hierarchy around a fused fine operator: the finest level remains matrix-free, the first coarse level is assembled by local Galerkin aggregation, and deeper levels use sparse Galerkin products. The practical default is FP32-GMG; BF16 is evaluated as a guarded mixed-precision variant and diagnostic stress test, not as the main speed mechanism. In a 27-case heterogeneous cantilever sweep, pass rates under a 200-iteration budget are 7/9, 4/9, and 1/9 at 64k, 216k, and 512k elements; converged-only mean iteration counts are about 112, 134, and 146.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
