Data Makes Better Data Scientists
Jinjin Zhao, Avidgor Gal, Sanjay Krishnan

TL;DR
This paper introduces a framework for logging and analyzing code execution in Jupyter notebooks to understand data science workflows and extract best practices, demonstrated through an experiment with students.
Contribution
It presents an early prototype framework for tracking data science code execution and insights, enabling analysis of real-world practices in educational settings.
Findings
Framework successfully logs code execution and insights
Experiment reveals common patterns in student data science projects
Potential to improve data science education and practice
Abstract
With the goal of identifying common practices in data science projects, this paper proposes a framework for logging and understanding incremental code executions in Jupyter notebooks. This framework aims to allow reasoning about how insights are generated in data science and extract key observations into best data science practices in the wild. In this paper, we show an early prototype of this framework and ran an experiment to log a machine learning project for 25 undergraduate students.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence · Artificial Intelligence in Healthcare · Machine Learning in Healthcare
