Processing Particle Data Flows with SmartNICs
Jianshen Liu, Carlos Maltzahn, Matthew L. Curry, Craig Ulmer

TL;DR
This paper explores leveraging SmartNICs, specifically BlueField-2, with Apache Arrow to efficiently process complex data flows, demonstrating hardware acceleration benefits for data unpacking and processing tasks.
Contribution
It introduces using Apache Arrow on SmartNICs for data flow processing and reports performance insights on BlueField-2 hardware adaptation.
Findings
Hardware acceleration improves data unpacking performance.
Apache Arrow facilitates flexible data-flow task implementation.
SmartNICs can offload significant processing from hosts.
Abstract
Many distributed applications implement complex data flows and need a flexible mechanism for routing data between producers and consumers. Recent advances in programmable network interface cards, or SmartNICs, represent an opportunity to offload data-flow tasks into the network fabric, thereby freeing the hosts to perform other work. System architects in this space face multiple questions about the best way to leverage SmartNICs as processing elements in data flows. In this paper, we advocate the use of Apache Arrow as a foundation for implementing data-flow tasks on SmartNICs. We report on our experiences adapting a partitioning algorithm for particle data to Apache Arrow and measure the on-card processing performance for the BlueField-2 SmartNIC. Our experiments confirm that the BlueField-2's (de)compression hardware can have a significant impact on in-transit workflows where data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Software System Performance and Reliability
