Building your Cross-Platform Application with RHEEM
Sanjay Chawla, Bertty Contreras-Rojas, Zoi Kaoudi, Sebastian Kruse,, Jorge-Arnulfo Quian\'e-Ruiz

TL;DR
Rheem is a versatile cross-platform data processing system that automatically optimizes and orchestrates tasks across multiple platforms, simplifying data analytics and reducing costs.
Contribution
It introduces a comprehensive system with a robust interface, a novel cost-based optimizer, and an executor for efficient cross-platform data processing.
Findings
Effective platform selection minimizes runtime and costs.
Rheem simplifies complex data processing workflows.
Open source release promotes adoption and further development.
Abstract
Today, organizations typically perform tedious and costly tasks to juggle their code and data across different data processing platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging because it requires quite good expertise for all the available data processing platforms. In this report, we present Rheem, a general-purpose cross-platform data processing system that alleviates users from the pain of finding the most efficient data processing platform for a given task. It also splits a task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). To offer cross-platform functionality, it features (i) a robust interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Scientific Computing and Data Management
