Runtime-Augmented LLMs for Crash Detection and Diagnosis in ML Notebooks
Yiran Wang, Jos\'e Antonio Hern\'andez L\'opez, Ulf Nilsson, D\'aniel Varr\'o

TL;DR
CRANE-LLM enhances large language models with runtime data from ML notebooks to accurately detect and diagnose crashes, improving reliability in ML development environments.
Contribution
This work introduces CRANE-LLM, a novel method that combines static code and runtime information to improve crash detection and diagnosis in ML notebooks.
Findings
Runtime information improves detection accuracy by 7-10 percentage points.
Diagnosis F1-score increases by 8-11 points with runtime data.
Performance gains vary across ML libraries and crash types.
Abstract
Jupyter notebooks are widely used for machine learning (ML) development due to their support for interactive and iterative experimentation. However, ML notebooks are highly prone to bugs, with crashes being among the most disruptive. Despite their practical importance, systematic methods for crash detection and diagnosis in ML notebooks remain largely unexplored. We present CRANE-LLM, a novel approach that augments large language models (LLMs) with structured runtime information extracted from the notebook kernel state to detect and diagnose crashes before executing a target cell. Given previously executed cells and a target cell, CRANE-LLM combines static code context with runtime information, including object types, tensor shapes, and data attributes, to predict whether the target cell will crash (detection) and explain the underlying cause (diagnosis). We evaluate CRANE-LLM on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Machine Learning and Algorithms
