Static Analysis Driven Enhancements for Comprehension in Machine Learning Notebooks
Ashwin Prasad Shivarpatna Venkatesh, Samkutty Sabu, Mouli Chekkapalli,, Jiawei Wang, Li Li, Eric Bodden

TL;DR
This paper introduces HeaderGen, a tool that automatically annotates machine learning notebooks with headers and classifications to improve readability and comprehension, supported by enhanced call graph analysis and type inference.
Contribution
HeaderGen extends existing call graph analysis with flow-sensitivity and external library support, achieving high accuracy in annotation and classification of ML code in notebooks.
Findings
HeaderGen achieves 95.6% precision and 95.3% recall in call graph analysis.
HeaderGen's header generation has 85.7% precision and 92.8% recall.
HeaderGen helps users complete comprehension tasks faster.
Abstract
Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line visualizations. Data scientists use Jupyter notebook as the de-facto standard for creating and sharing machine-learning based solutions, primarily written in Python. Recent studies have demonstrated, however, that a large portion of Jupyter notebooks available on public platforms are undocumented and lacks a narrative structure. This reduces the readability of these notebooks. To address this shortcoming, this paper presents HeaderGen, a novel tool-based approach that automatically annotates code cells with categorical markdown headers based on a taxonomy of ML operations, and classifies and displays function calls according to this taxonomy. For this functionality to be realized, HeaderGen enhances an existing call graph analysis in PyCG. To improve precision, HeaderGen extends PyCG's analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Computational Physics and Python Applications
