Meeting in the notebook: a notebook-based environment for micro-submissions in data science collaborations
Micah J. Smith, J\"urgen Cito, Kalyan Veeramachaneni

TL;DR
Assemblé is a notebook-based environment that simplifies collaborative data science by enabling code contributions and version control within JupyterLab, improving workflow robustness.
Contribution
It introduces Assemblé, a novel environment integrating version control and collaboration directly into JupyterLab for data science workflows.
Findings
User study with 23 data scientists showed improved collaboration.
Assemblé reduced complexity of version control in notebooks.
Participants found Assemblé easier to use than traditional methods.
Abstract
Developers in data science and other domains frequently use computational notebooks to create exploratory analyses and prototype models. However, they often struggle to incorporate existing software engineering tooling into these notebook-based workflows, leading to fragile development processes. We introduce Assembl\'{e}, a new development environment for collaborative data science projects, in which promising code fragments of data science pipelines can be contributed as pull requests to an upstream repository entirely from within JupyterLab, abstracting away low-level version control tool usage. We describe the design and implementation of Assembl\'{e} and report on a user study of 23 data scientists.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Data Quality and Management
