TL;DR
This paper introduces Ballet, a lightweight framework and cloud environment that facilitates collaborative data science by enabling incremental feature engineering and automatic evaluation, thus addressing scalability challenges.
Contribution
The paper presents Ballet, a novel framework and programming model that enhances collaborative data science development through incremental feature proposals and automated performance evaluation.
Findings
Successful case study with 27 collaborators on income prediction
Framework enables automatic merging of feature proposals based on ML performance
Facilitates scalable, collaborative feature engineering in data science
Abstract
While the open-source software development model has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small teams. We describe challenges to scaling data science collaborations and present a conceptual framework and ML programming model to address them. We instantiate these ideas in Ballet, a lightweight framework for collaborative, open-source data science through a focus on feature engineering, and an accompanying cloud-based development environment. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to an ML performance evaluation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct a case study analysis of an income prediction problem with 27 collaborators, and discuss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
