Hierarchical Precision and Recursion for Accelerating Symmetric Linear Solves on MXUs
Vicki Carrica, Rabab Alomairy, Evelyne Ringoot, Alan Edelman

TL;DR
This paper introduces a hierarchical recursive mixed-precision solver for symmetric linear systems on MXUs, achieving significant speedups while maintaining numerical stability and broad hardware portability.
Contribution
It presents a novel recursive, mixed-precision algorithm for symmetric linear solves optimized for MXUs, combining hierarchical recursion with custom data structures for stability and performance.
Findings
14x speedup of SYRK on H200 over cuBLAS
Up to 27x speedup in mixed-precision SYRK
5x overall speedup in Cholesky with improved accuracy
Abstract
Symmetric linear solves are fundamental to a wide range of scientific and engineering applications, from climate modeling and structural analysis to machine learning and optimization. These workloads often rely on Cholesky (POTRF) decomposition and its supporting operations, triangular solves (TRSM) and symmetric rank-k updates (SYRK), which together form the computational core for solving symmetric positive-definite systems. To accelerate these kernels, we present a portable, mixed-precision solver designed for Matrix Processing Units (MXUs), including NVIDIA Tensor Cores (H200) and AMD Matrix Cores (MI300X). Our algorithm builds on a nested recursive formulation in which Cholesky exposes parallelism through recursive decomposition of its TRSM and SYRK sub-problems. This structure yields a hierarchical recursion that maximizes GEMM throughput while enabling fine-grained control over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Numerical Methods and Algorithms
