Bauplan: zero-copy, scale-up FaaS for data pipelines
Jacopo Tagliabue, Tyler Caraza-Harter, Ciro Greco

TL;DR
Bauplan introduces a specialized FaaS model optimized for data pipelines, enabling declarative DAGs and efficient execution, thus improving performance and developer experience for data workloads.
Contribution
It presents bauplan, a novel FaaS programming model tailored for data pipelines that reduces generality to enhance data-awareness and efficiency.
Findings
Achieves better performance on data workloads
Provides a more developer-friendly experience
Efficient execution of declarative DAGs
Abstract
Chaining functions for longer workloads is a key use case for FaaS platforms in data applications. However, modern data pipelines differ significantly from typical serverless use cases (e.g., webhooks and microservices); this makes it difficult to retrofit existing pipeline frameworks due to structural constraints. In this paper, we describe these limitations in detail and introduce bauplan, a novel FaaS programming model and serverless runtime designed for data practitioners. bauplan enables users to declaratively define functional Directed Acyclic Graphs (DAGs) along with their runtime environments, which are then efficiently executed on cloud-based workers. We show that bauplan achieves both better performance and a superior developer experience for data workloads by making the trade-off of reducing generality in favor of data-awareness
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Simulation Techniques and Applications
