Network-Accelerated Non-Contiguous Memory Transfers

Salvatore Di Girolamo; Konstantin Taranov; Andreas Kurth; Michael; Schaffner; Timo Schneider; Jakub Ber\'anek; Maciej Besta; Luca Benini; Duncan; Roweth; Torsten Hoefler

arXiv:1908.08590·cs.NI·August 26, 2019

Network-Accelerated Non-Contiguous Memory Transfers

Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael, Schaffner, Timo Schneider, Jakub Ber\'anek, Maciej Besta, Luca Benini, Duncan, Roweth, Torsten Hoefler

PDF

TL;DR

This paper demonstrates that non-contiguous memory transfers in HPC applications can be significantly accelerated using network offloading, achieving up to 10x throughput improvements and enabling truly zero-copy communications.

Contribution

It introduces a method to transparently offload non-contiguous memory transfers to NICs using sPIN, enabling network acceleration of MPI datatype processing.

Findings

01

Up to 10x speedup in unpack throughput for real applications.

02

Non-contiguous transfers are viable candidates for network acceleration.

03

Implementation of sPIN within a Portals 4 NIC SST model.

Abstract

Applications often communicate data that is non-contiguous in the send- or the receive-buffer, e.g., when exchanging a column of a matrix stored in row-major order. While non-contiguous transfers are well supported in HPC (e.g., MPI derived datatypes), they can still be up to 5x slower than contiguous transfers of the same size. As we enter the era of network acceleration, we need to investigate which tasks to offload to the NIC: In this work we argue that non-contiguous memory transfers can be transparently networkaccelerated, truly achieving zero-copy communications. We implement and extend sPIN, a packet streaming processor, within a Portals 4 NIC SST model, and evaluate strategies for NIC-offloaded processing of MPI datatypes, ranging from datatype-specific handlers to general solutions for any MPI datatype. We demonstrate up to 10x speedup in the unpack throughput of real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.