Performance of MPI sends of non-contiguous data
Victor Eijkhout

TL;DR
This paper investigates how different MPI derived datatype schemes perform for non-contiguous data transfers, revealing that a combination of packing and derived types yields optimal efficiency for large messages.
Contribution
It provides an experimental comparison of MPI datatype schemes and identifies the most efficient approach for large message transfers.
Findings
Most schemes perform similarly for small messages.
Internal buffering affects efficiency for large messages.
Combining packing with derived types is optimal for large data.
Abstract
We present an experimental investigation of the performance of MPI derived datatypes. For messages up to the megabyte range most schemes perform comparably to each other and to manual copying into a regular send buffer. However, for large messages the internal buffering of MPI causes differences in efficiency. The optimal scheme is a combination of packing and derived types.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
