Strategies for the vectorized Block Conjugate Gradients method

Nils-Arne Dreier; Christian Engwer

arXiv:1912.11930·math.NA·December 30, 2019

Strategies for the vectorized Block Conjugate Gradients method

Nils-Arne Dreier, Christian Engwer

PDF

TL;DR

This paper reviews and applies a block Krylov framework to the Block Conjugate Gradients method, addressing modern hardware challenges and providing a performance model with experimental validation.

Contribution

It introduces a framework combining block Krylov and data parallel methods for Block CG, along with a performance model tailored for modern hardware architectures.

Findings

01

Performance model accurately predicts efficiency of Block CG variants.

02

Experimental results validate the proposed performance predictions.

03

Addressed hardware challenges like memory bandwidth and SIMD utilization.

Abstract

Block Krylov methods have recently gained a lot of attraction. Due to their increased arithmetic intensity they offer a promising way to improve performance on modern hardware. Recently Frommer et al. presented a block Krylov framework that combines the advantages of block Krylov methods and data parallel methods. We review this framework and apply it on the Block Conjugate Gradients method,to solve linear systems with multiple right hand sides. In this course we consider challenges that occur on modern hardware, like a limited memory bandwidth, the use of SIMD instructions and the communication overhead. We present a performance model to predict the efficiency of different Block CG variants and compare these with experimental numerical results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.