Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs

Christie Alappat; Jonas Thies; Georg Hager; Holger Fehske; Gerhard Wellein

arXiv:2309.02228·math.NA·May 12, 2026

Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs

Christie Alappat, Jonas Thies, Georg Hager, Holger Fehske, Gerhard Wellein

PDF

TL;DR

This paper introduces an algebraic cache-blocking technique for sparse iterative solvers on multi-core CPUs, significantly accelerating matrix polynomial evaluations crucial for large-scale simulations.

Contribution

It demonstrates the integration of the RACE framework into Trilinos, achieving up to 3x speedups in MPK-dominated algorithms and applying this to real-world wind turbine simulations.

Findings

01

Achieved up to 3x speedup in MPK-based algorithms on multi-core systems.

02

Speedups are reduced when subspace orthogonalization dominates due to routine quality.

03

Applied RACE-accelerated solvers successfully in a wind turbine simulation.

Abstract

Sparse linear iterative solvers are essential for many large-scale simulations. Much of the runtime of these solvers is often spent in the implicit evaluation of matrix polynomials via a sequence of sparse matrix-vector products. A variety of approaches has been proposed to make these polynomial evaluations explicit (i.e., fix the coefficients), e.g., polynomial preconditioners or s-step Krylov methods. Furthermore, it is nowadays a popular practice to approximate triangular solves by a matrix polynomial to increase parallelism. Such algorithms allow to evaluate the polynomial using a so-called matrix power kernel (MPK), which computes the product between a power of a sparse matrix A and a dense vector x, or a related operation. Recently we have shown that using the level-based formulation of sparse matrix-vector multiplications in the Recursive Algebraic Coloring Engine (RACE)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.