Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time
Deepthi Raghunandan, Aayushi Roy, Shenzhi Shi, Niklas Elmqvist, and, Leilani Battle

TL;DR
This paper introduces a quantitative method to analyze how data scientists' sensemaking behaviors evolve over time within Jupyter notebooks, revealing diverse activities and informing tool design.
Contribution
It presents the first automated approach to measure and analyze sensemaking shifts in data science notebooks over multiple iterations.
Findings
Authors engage in diverse sensemaking activities over time.
Notebook scores reveal shifts between exploration and explanation.
Design recommendations for enhancing notebook tools based on observed behaviors.
Abstract
Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time -- between exploration and explanation. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 2,574 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
