ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells
Sergey Titov, Yaroslav Golubev, Timofey Bryksin

TL;DR
ReSplit is an algorithm that automatically re-splits Jupyter notebook cells to improve their structure, making them more self-contained and easier to understand, based on analysis of code definition-usage chains.
Contribution
The paper introduces ReSplit, a novel algorithm for automatically re-splitting Jupyter notebook cells to enhance their modularity and clarity.
Findings
ReSplit improved notebook structure in 29.5% of cases according to human evaluation.
The algorithm effectively identifies logical boundaries for cell splitting and merging.
Human experts preferred re-split notebooks in nearly a third of cases.
Abstract
Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a programming function. Similar to functions, Jupyter cells should strive to contain singular, self-contained actions. At the same time, research shows that real-world notebooks fail to do so and suffer from the lack of proper structure. To combat this, we propose ReSplit, an algorithm for an automatic re-splitting of cells in Jupyter notebooks. The algorithm analyzes definition-usage chains in the notebook and consists of two parts - merging and splitting the cells. We ran the algorithm on a large corpus of notebooks to evaluate its performance and its overall effect on notebooks, and evaluated it by human experts: we showed them several notebooks in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
