Strategies for the vectorized Block Conjugate Gradients method
Nils-Arne Dreier, Christian Engwer

TL;DR
This paper reviews and applies a block Krylov framework to the Block Conjugate Gradients method, addressing modern hardware challenges and providing a performance model with experimental validation.
Contribution
It introduces a framework combining block Krylov and data parallel methods for Block CG, along with a performance model tailored for modern hardware architectures.
Findings
Performance model accurately predicts efficiency of Block CG variants.
Experimental results validate the proposed performance predictions.
Addressed hardware challenges like memory bandwidth and SIMD utilization.
Abstract
Block Krylov methods have recently gained a lot of attraction. Due to their increased arithmetic intensity they offer a promising way to improve performance on modern hardware. Recently Frommer et al. presented a block Krylov framework that combines the advantages of block Krylov methods and data parallel methods. We review this framework and apply it on the Block Conjugate Gradients method,to solve linear systems with multiple right hand sides. In this course we consider challenges that occur on modern hardware, like a limited memory bandwidth, the use of SIMD instructions and the communication overhead. We present a performance model to predict the efficiency of different Block CG variants and compare these with experimental numerical results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
