Bug Analysis in Jupyter Notebook Projects: An Empirical Study
Taijara Loiola de Santana, Paulo Anselmo da Mota Silveira Neto and, Eduardo Santana de Almeida, Iftekhar Ahmed

TL;DR
This paper provides a comprehensive empirical analysis of bugs in Jupyter Notebook projects, based on large-scale data mining, practitioner interviews, and bug taxonomy development, revealing key challenges faced by data scientists.
Contribution
It introduces a detailed bug taxonomy for Jupyter projects and offers insights into common bug categories, root causes, and development challenges from practitioners' perspectives.
Findings
Identified prevalent bug categories and their root causes.
Revealed common development challenges faced by Jupyter practitioners.
Proposed a taxonomy to classify Jupyter bugs systematically.
Abstract
Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, there has been no thorough study to understand Jupyter development challenges from the practitioners' point of view. This paper presents a systematic study of bugs and challenges that Jupyter practitioners face through a large-scale empirical investigation. We mined 14,740 commits from 105 GitHub open-source projects with Jupyter notebook code. Next, we analyzed 30,416 Stack Overflow posts which gave us insights into bugs that practitioners face when developing Jupyter notebook projects. Finally, we conducted nineteen interviews with data scientists to uncover more details about Jupyter bugs and to gain insights into Jupyter developers' challenges. We propose a bug taxonomy for Jupyter projects based on our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Data Visualization and Analytics
