PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
Jia Zou, R. Matthew Barnett, Tania Lorido-Botran, Shangyu Luo, Carlos, Monroy, Sourav Sikdar, Kia Teymourian, Binhang Yuan, Chris Jermaine

TL;DR
PlinyCompute is a high-performance, distributed data-intensive system offering a high-level declarative interface and a low-level object model, enabling efficient development of reusable tools with significant speed advantages over Spark.
Contribution
It introduces a hybrid system combining declarative optimization with a custom memory model, outperforming JVM-based systems like Spark in data-intensive computations.
Findings
Achieves 2x to over 50x speedup compared to Spark.
Provides a high-level declarative interface with a custom object model.
Demonstrates efficient development of reusable data tools.
Abstract
This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Data Storage Technologies
