JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks

Yiran Wang; Jos\'e Antonio Hern\'andez L\'opez; Ulf Nilsson; D\'aniel Varr\'o

arXiv:2510.18013·cs.SE·May 1, 2026

JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks

Yiran Wang, Jos\'e Antonio Hern\'andez L\'opez, Ulf Nilsson, D\'aniel Varr\'o

PDF

TL;DR

JunoBench is a comprehensive benchmark dataset of 111 real-world crashes in Python machine learning notebooks, aiding research in bug detection, diagnosis, and repair.

Contribution

It introduces the first curated, reproducible crash dataset from Kaggle notebooks, covering various ML libraries and notebook-specific errors.

Findings

01

Includes 111 curated crashes with verified fixes.

02

Provides detailed crash annotations and diagnostic labels.

03

Ensures reproducibility through a unified environment.

Abstract

Jupyter notebooks are widely used for machine learning (ML) prototyping. Yet, few debugging tools are designed for ML code in notebooks, partly, due to the lack of benchmarks. We introduce JunoBench, the first benchmark dataset of real-world crashes in Python-based ML notebooks. JunoBench includes 111 curated and reproducible crashes with verified fixes from public Kaggle notebooks, covering popular ML libraries (e.g., TensorFlow/Keras, PyTorch, Scikit-learn) and notebook-specific out-of-order execution errors. JunoBench ensures reproducibility and ease of use through a unified environment that reliably reproduces all crashes. By providing realistic crashes, their resolutions, richly annotated labels of crash characteristics, and natural-language diagnostic annotations, JunoBench facilitates research on bug detection, localization, diagnosis, and repair in notebook-based ML development.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.