Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory
Philip Groet, Joost Hoozemans, Andreas Grapentin, Felix Eberhardt,, Zaid Al-Ars, H. Peter Hofstee

TL;DR
This paper presents a distributed Apache Arrow implementation utilizing cluster-shared, hardware-coherent memory via OpenCAPI, enabling zero-copy, zero-serialization data sharing across nodes in a cluster.
Contribution
It introduces a novel distributed Apache Arrow system that leverages cluster-shared memory and hardware coherence, built on the ThymesisFlow prototype and OpenCAPI interface.
Findings
Enables zero-copy data sharing across cluster nodes
Creates distributed Apache Arrow tables accessible in each node
Utilizes hardware-coherent, cluster-shared memory for efficiency
Abstract
This paper describes a distributed implementation of Apache Arrow that can leverage cluster-shared load-store addressable memory that is hardware-coherent only within each node. The implementation is built on the ThymesisFlow prototype that leverages the OpenCAPI interface to create a shared address space across a cluster. While Apache Arrow structures are immutable, simplifying their use in a cluster shared memory, this paper creates distributed Apache Arrow tables and makes them accessible in each node.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Scientific Computing and Data Management
