Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform   Approach

Hernan Picatto; Georg Heiler; Peter Klimek

arXiv:2408.11635·cs.DC·August 22, 2024

Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach

Hernan Picatto, Georg Heiler, Peter Klimek

PDF

Open Access 1 Repo

TL;DR

This paper presents a cost-effective, multi-platform data orchestration framework using Dagster that improves performance and reduces operational costs compared to traditional PaaS solutions like EMR and Databricks.

Contribution

It introduces a flexible, vendor-agnostic orchestration approach with significant cost savings and performance improvements for big data processing.

Findings

01

Achieved 12% performance improvement over EMR.

02

Realized 40% cost reduction compared to Databricks.

03

Saved over 300 euros per pipeline run.

Abstract

The rapid advancement of big data technologies has underscored the need for robust and efficient data processing solutions. Traditional Spark-based Platform-as-a-Service (PaaS) solutions, such as Databricks and Amazon Web Services Elastic MapReduce, provide powerful analytics capabilities but often result in high operational costs and vendor lock-in issues. These platforms, while user-friendly, can lead to significant inefficiencies due to their cost structures and lack of transparent pricing. This paper introduces a cost-effective and flexible orchestration framework using Dagster. Our solution aims to reduce dependency on any single PaaS provider by integrating various Spark execution environments. We demonstrate how Dagster's orchestration capabilities can enhance data processing efficiency, enforce best coding practices, and significantly reduce operational costs. In our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ascii-supply-networks/ascii-hydra
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence