Offloading MPI Parallel Prefix Scan (MPI_Scan) with the NetFPGA
Omer Arap, Martin Swany

TL;DR
This paper presents a hardware-accelerated implementation of MPI_Scan using NetFPGA, demonstrating potential performance improvements for collective operations in MPI applications.
Contribution
It introduces a novel approach to offload MPI_Scan to programmable network hardware, enhancing efficiency in MPI collective operations.
Findings
Hardware offloading reduces MPI_Scan execution time.
NetFPGA implementation outperforms MPI over Ethernet in tests.
Potential for improved parallel application performance.
Abstract
Parallel programs written using the standard Message Passing Interface (MPI) frequently depend upon the ability to efficiently execute collective operations. MPI_Scan is a collective operation defined in MPI that implements parallel prefix scan which is very useful primitive operation in several parallel applications. This operation can be very time consuming. In this paper, we explore the use of hardware programmable network interface cards utilizing standard media access protocols for offloading the MPI_Scan operation to the underlying network. Our work is based upon the NetFPGA - a programmable network interface with an on-board Virtex FPGA and four Ethernet interfaces. We have implemented a network-level MPI_Scan operation using the NetFPGA for use in MPI environments. This paper compares the performance of this implementation with MPI over Ethernet for a small configuration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques
