Exploring the Jupyter Ecosystem: An Empirical Study of Bugs and Vulnerabilities
Wenyuan Jiang, Diany Pressato, Harsh Darji, Thibaud Lutellier

TL;DR
This paper presents an extensive empirical analysis of bugs and vulnerabilities in the Jupyter Notebook ecosystem, revealing common issues like configuration errors and API misuse, and assessing security risks in deployment frameworks.
Contribution
It offers the first large-scale empirical study of Jupyter Notebook bugs and vulnerabilities, including a bug taxonomy and security risk assessment.
Findings
Configuration issues are the most common bugs.
Incorrect API usage frequently occurs in notebooks.
Security vulnerabilities are linked to deployment frameworks.
Abstract
Background. Jupyter notebooks are one of the main tools used by data scientists. Notebooks include features (configuration scripts, markdown, images, etc.) that make them challenging to analyze compared to traditional software. As a result, existing software engineering models, tools, and studies do not capture the uniqueness of Notebook's behavior. Aims. This paper aims to provide a large-scale empirical study of bugs and vulnerabilities in the Notebook ecosystem. Method. We collected and analyzed a large dataset of Notebooks from two major platforms. Our methodology involved quantitative analyses of notebook characteristics (such as complexity metrics, contributor activity, and documentation) to identify factors correlated with bugs. Additionally, we conducted a qualitative study using grounded theory to categorize notebook bugs, resulting in a comprehensive bug taxonomy. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Information and Cyber Security · Web Application Security Vulnerabilities
