Integrating Code Metrics into Automated Documentation Generation for Computational Notebooks
Mojtaba Mostafavi Ghahfarokhi, Hamed Jahantigh, Alireza Asadi, Abbas Heydarnoori

TL;DR
This paper explores how integrating code metrics into neural and LLM-based models improves automated documentation generation for computational notebooks, leading to more accurate and contextually relevant documentation.
Contribution
It introduces a specialized dataset and evaluates two modeling paradigms, demonstrating that code metrics enhance documentation quality in automated systems.
Findings
Incorporating code metrics improves BLEU-1 by 6%.
Code metrics increase ROUGE-L F1 by 3%.
Metrics boost BERTScore F1 by 9%.
Abstract
Effective code documentation is essential for collaboration, comprehension, and long-term software maintainability, yet developers often neglect it due to its repetitive nature. Automated documentation generation has evolved from heuristic and rule-based methods to neural network-based and large language model (LLM)-based approaches. However, existing methods often overlook structural and quantitative characteristics of code that influence readability and comprehension. Prior research suggests that code metrics capture information relevant to program understanding. Building on these insights, this paper investigates the role of source code metrics as auxiliary signals for automated documentation generation, focusing on computational notebooks, a popular medium among data scientists that integrates code, narrative, and results but suffers from inconsistent documentation. We propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Teaching and Learning Programming · Scientific Computing and Data Management
