A Static Analysis Framework for Data Science Notebooks
Pavle Suboti\'c, Lazar Miliki\'c, Milan Stoji\'c

TL;DR
This paper introduces a static analysis framework tailored for data science notebooks that accounts for their unique execution semantics, improving correctness and reproducibility.
Contribution
It presents a general framework for static analysis of notebooks, accommodating various analyses and demonstrating efficiency on large real-world datasets.
Findings
98.7% of notebooks analyzed in under a second
Framework supports diverse analysis types
Enhances notebook correctness and reproducibility
Abstract
Notebooks provide an interactive environment for programmers to develop code, analyse data and inject interleaved visualizations in a single environment. Despite their flexibility, a major pitfall that data scientists encounter is unexpected behaviour caused by the unique out-of-order execution model of notebooks. As a result, data scientists face various challenges ranging from notebook correctness, reproducibility and cleaning. In this paper, we propose a framework that performs static analysis on notebooks, incorporating their unique execution semantics. Our framework is general in the sense that it accommodate for a wide range of analyses, useful for various notebook use cases. We have instantiated our framework on a diverse set of analyses and have evaluated them on 2211 real world notebooks. Our evaluation demonstrates that the vast majority (98.7%) of notebooks can be analysed in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability
