Data Makes Better Data Scientists

Jinjin Zhao; Avidgor Gal; Sanjay Krishnan

arXiv:2405.17690·cs.HC·May 29, 2024

Data Makes Better Data Scientists

Jinjin Zhao, Avidgor Gal, Sanjay Krishnan

PDF

Open Access

TL;DR

This paper introduces a framework for logging and analyzing code execution in Jupyter notebooks to understand data science workflows and extract best practices, demonstrated through an experiment with students.

Contribution

It presents an early prototype framework for tracking data science code execution and insights, enabling analysis of real-world practices in educational settings.

Findings

01

Framework successfully logs code execution and insights

02

Experiment reveals common patterns in student data science projects

03

Potential to improve data science education and practice

Abstract

With the goal of identifying common practices in data science projects, this paper proposes a framework for logging and understanding incremental code executions in Jupyter notebooks. This framework aims to allow reasoning about how insights are generated in data science and extract key observations into best data science practices in the wild. In this paper, we show an early prototype of this framework and ran an experiment to log a machine learning project for 25 undergraduate students.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Artificial Intelligence in Healthcare · Machine Learning in Healthcare