Memory-Disaggregated In-Memory Object Store Framework for Big Data Applications
Robin Abrahamse, Akos Hadnagy, Zaid Al-Ars

TL;DR
This paper introduces a memory disaggregated in-memory object store framework for big data applications, enabling efficient distributed data management with minimal performance penalty, leveraging ThymesisFlow and Apache Arrow Plasma.
Contribution
It extends Apache Arrow Plasma to support distributed systems using memory disaggregation, facilitating easier and more efficient big data processing across multiple nodes.
Findings
Modest performance penalty (~6.5 vs ~5.75 GiB/s) for remote memory access.
Framework enables efficient data sharing across nodes in big data applications.
Open-source implementation available for further research and development.
Abstract
The concept of memory disaggregation has recently been gaining traction in research. With memory disaggregation, data center compute nodes can directly access memory on adjacent nodes and are therefore able to overcome local memory restrictions, introducing a new data management paradigm for distributed computing. This paper proposes and demonstrates a memory disaggregated in-memory object store framework for big data applications by leveraging the newly introduced ThymesisFlow memory disaggregation system. The framework extends the functionality of the pre-existing Apache Arrow Plasma object store framework to distributed systems by enabling clients to easily and efficiently produce and consume data objects across multiple compute nodes. This allows big data applications to increasingly leverage parallel processing at reduced development costs. In addition, the paper includes latency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Advanced Data Storage Technologies
