An Empirical Study of Developers' Challenges in Implementing Workflows as Code: A Case Study on Apache Airflow
Jerin Yasmin, Jiale Wang, Yuan Tian, Bram Adams

TL;DR
This study analyzes 1,000 Stack Overflow posts to identify common challenges faced by developers implementing Workflows as Code with Apache Airflow, revealing key obstacles, root causes, and documentation reliance.
Contribution
It provides a hierarchical taxonomy of Airflow-related challenges and uncovers the main root causes and documentation sources used by developers.
Findings
Most challenges occur during workflow definition and execution
Incorrect configuration and complex setup are primary root causes
External documentation is frequently referenced for problem resolution
Abstract
The Workflows as Code paradigm is becoming increasingly essential to streamline the design and management of complex processes within data-intensive software systems. These systems require robust capabilities to process, analyze, and extract insights from large datasets. Workflow orchestration platforms such as Apache Airflow are pivotal in meeting these needs, as they effectively support the implementation of the Workflows as Code paradigm. Nevertheless, despite its considerable advantages, developers still face challenges due to the specialized demands of workflow orchestration and the complexities of distributed execution environments. In this paper, we manually study 1,000 sampled Stack Overflow posts derived from 9,591 Airflow-related questions to understand developers' challenges and root causes while implementing Workflows as Code. Our analysis results in a hierarchical taxonomy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Scientific Computing and Data Management · Cloud Computing and Resource Management
