What makes an Expert? Comparing Problem-solving Practices in Data Science Notebooks
Manuel Valle Torre, Marcus Specht, Catharine Oertel

TL;DR
This study empirically compares the problem-solving practices of data science experts and novices in Jupyter notebooks, revealing that expertise is characterized by workflow structure and iterative, efficient actions rather than different phase transitions.
Contribution
It introduces a multi-level sequence analysis of notebook actions to distinguish expert from novice problem-solving strategies in data science.
Findings
Experts use shorter, more iterative workflows.
Novices follow longer, linear processes.
Workflow structure and action patterns differentiate expertise.
Abstract
The development of data science expertise requires tacit, process-oriented skills that are difficult to teach directly. This study addresses the resulting challenge of empirically understanding how the problem-solving processes of experts and novices differ. We apply a multi-level sequence analysis to 440 Jupyter notebooks from a public dataset, mapping low-level coding actions to higher-level problem-solving practices. Our findings reveal that experts do not follow fundamentally different transitions between data science phases than novices (e.g., Data Import, EDA, Model Training, Visualization). Instead, expertise is distinguished by the overall workflow structure from a problem-solving perspective and cell-level, fine-grained action patterns. Novices tend to follow long, linear processes, whereas experts employ shorter, more iterative strategies enacted through efficient,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies · Data Visualization and Analytics · Computational and Text Analysis Methods
