Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs
Christie Alappat, Jonas Thies, Georg Hager, Holger Fehske, Gerhard Wellein

TL;DR
This paper introduces an algebraic cache-blocking technique for sparse iterative solvers on multi-core CPUs, significantly accelerating matrix polynomial evaluations crucial for large-scale simulations.
Contribution
It demonstrates the integration of the RACE framework into Trilinos, achieving up to 3x speedups in MPK-dominated algorithms and applying this to real-world wind turbine simulations.
Findings
Achieved up to 3x speedup in MPK-based algorithms on multi-core systems.
Speedups are reduced when subspace orthogonalization dominates due to routine quality.
Applied RACE-accelerated solvers successfully in a wind turbine simulation.
Abstract
Sparse linear iterative solvers are essential for many large-scale simulations. Much of the runtime of these solvers is often spent in the implicit evaluation of matrix polynomials via a sequence of sparse matrix-vector products. A variety of approaches has been proposed to make these polynomial evaluations explicit (i.e., fix the coefficients), e.g., polynomial preconditioners or s-step Krylov methods. Furthermore, it is nowadays a popular practice to approximate triangular solves by a matrix polynomial to increase parallelism. Such algorithms allow to evaluate the polynomial using a so-called matrix power kernel (MPK), which computes the product between a power of a sparse matrix A and a dense vector x, or a related operation. Recently we have shown that using the level-based formulation of sparse matrix-vector multiplications in the Recursive Algebraic Coloring Engine (RACE)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
