Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks

Arumoy Shome; Luis Cruz; Diomidis Spinellis; Arie van Deursen

arXiv:2408.00153·cs.SE·August 2, 2024

Understanding Feedback Mechanisms in Machine Learning Jupyter Notebooks

Arumoy Shome, Luis Cruz, Diomidis Spinellis, Arie van Deursen

PDF

Open Access

TL;DR

This paper investigates feedback mechanisms in machine learning development using Jupyter notebooks, revealing prevalent implicit feedback and proposing automated validation to improve workflow reliability and reproducibility.

Contribution

It is the first to systematically analyze feedback mechanisms in ML notebooks, categorizing implicit and explicit types and highlighting opportunities for automation and better documentation.

Findings

01

Implicit feedback dominates critical design decisions.

02

Explicit feedback mechanisms are underused.

03

Automated validation via assertions can improve ML workflow reliability.

Abstract

The machine learning development lifecycle is characterized by iterative and exploratory processes that rely on feedback mechanisms to ensure data and model integrity. Despite the critical role of feedback in machine learning engineering, no prior research has been conducted to identify and understand these mechanisms. To address this knowledge gap, we mine 297.8 thousand Jupyter notebooks and analyse 2.3 million code cells. We identify three key feedback mechanisms -- assertions, print statements and last cell statements -- and further categorize them into implicit and explicit forms of feedback. Our findings reveal extensive use of implicit feedback for critical design decisions and the relatively limited adoption of explicit feedback mechanisms. By conducting detailed case studies with selected feedback instances, we uncover the potential for automated validation of critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics