sAirflow: Adopting Serverless in a Legacy Workflow Scheduler
Filip Mikina, Pawel Zuk, Krzysztof Rzadca

TL;DR
sAirflow demonstrates how to adapt a legacy workflow scheduler to serverless environments using minimal code changes, achieving significant cost savings and improved scalability compared to traditional managed solutions.
Contribution
This paper introduces sAirflow, the first serverless adaptation of Airflow's control plane and workers that maintains interface compatibility with minimal code modifications.
Findings
sAirflow halves monetary costs compared to MWAA.
sAirflow scales to 125 workers in seconds, reducing workflow makespan by up to 7x.
sAirflow achieves comparable performance on warm systems.
Abstract
Serverless clouds promise efficient scaling, reduced toil and monetary costs. Yet, serverless-ing a complex, legacy application might require major refactoring and thus is risky. As a case study, we use Airflow, an industry-standard workflow system. To reduce migration risk, we propose to limit code modifications by relying on change data capture (CDC) and message queues for internal communication. To achieve serverless efficiency, we rely on Function-as-a-Service (FaaS). Our system, sAirflow, is the first adaptation of the control plane and workers to the serverless cloud - and it maintains the same interface and most of the code. Experimentally, we show that sAirflow delivers the key serverless benefits: scaling and cost reduction. We compare sAirflow to MWAA, a managed (SaaS) Airflow. On Alibaba benchmarks on warm systems, sAirflow performs similarly while halving the monetary cost.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management
