Runtime-Augmented LLMs for Crash Detection and Diagnosis in ML Notebooks

Yiran Wang; Jos\'e Antonio Hern\'andez L\'opez; Ulf Nilsson; D\'aniel Varr\'o

arXiv:2602.18537·cs.SE·February 24, 2026

Runtime-Augmented LLMs for Crash Detection and Diagnosis in ML Notebooks

Yiran Wang, Jos\'e Antonio Hern\'andez L\'opez, Ulf Nilsson, D\'aniel Varr\'o

PDF

Open Access

TL;DR

CRANE-LLM enhances large language models with runtime data from ML notebooks to accurately detect and diagnose crashes, improving reliability in ML development environments.

Contribution

This work introduces CRANE-LLM, a novel method that combines static code and runtime information to improve crash detection and diagnosis in ML notebooks.

Findings

01

Runtime information improves detection accuracy by 7-10 percentage points.

02

Diagnosis F1-score increases by 8-11 points with runtime data.

03

Performance gains vary across ML libraries and crash types.

Abstract

Jupyter notebooks are widely used for machine learning (ML) development due to their support for interactive and iterative experimentation. However, ML notebooks are highly prone to bugs, with crashes being among the most disruptive. Despite their practical importance, systematic methods for crash detection and diagnosis in ML notebooks remain largely unexplored. We present CRANE-LLM, a novel approach that augments large language models (LLMs) with structured runtime information extracted from the notebook kernel state to detect and diagnose crashes before executing a target cell. Given previously executed cells and a target cell, CRANE-LLM combines static code context with runtime information, including object types, tensor shapes, and data attributes, to predict whether the target cell will crash (detection) and explain the underlying cause (diagnosis). We evaluate CRANE-LLM on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Machine Learning and Algorithms