Parallel sparse matrix-vector multiplication as a test case for hybrid   MPI+OpenMP programming

Gerald Schubert; Georg Hager; Holger Fehske; Gerhard Wellein

arXiv:1101.0091·cs.PF·March 1, 2012

Parallel sparse matrix-vector multiplication as a test case for hybrid MPI+OpenMP programming

Gerald Schubert, Georg Hager, Holger Fehske, Gerhard Wellein

PDF

TL;DR

This paper investigates optimized parallel sparse matrix-vector multiplication on multicore clusters, demonstrating that explicit communication-computation overlap via a dedicated thread improves performance over traditional MPI and hybrid strategies.

Contribution

It introduces a hybrid MPI+OpenMP approach with a dedicated communication thread to better overlap communication and computation in sparse matrix-vector multiplication.

Findings

01

Explicit communication overlap improves performance

02

Dedicated communication thread outperforms standard MPI

03

Hybrid approach surpasses pure MPI in scalability

Abstract

We evaluate optimized parallel sparse matrix-vector operations for two representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Going beyond the single node, parallel sparse matrix-vector operations often suffer from an unfavorable communication to computation ratio. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. We compare our approach to pure MPI and the widely used "vector-like" hybrid programming strategy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.