Integrating Code Metrics into Automated Documentation Generation for Computational Notebooks

Mojtaba Mostafavi Ghahfarokhi; Hamed Jahantigh; Alireza Asadi; Abbas Heydarnoori

arXiv:2602.08133·cs.SE·February 10, 2026

Integrating Code Metrics into Automated Documentation Generation for Computational Notebooks

Mojtaba Mostafavi Ghahfarokhi, Hamed Jahantigh, Alireza Asadi, Abbas Heydarnoori

PDF

Open Access

TL;DR

This paper explores how integrating code metrics into neural and LLM-based models improves automated documentation generation for computational notebooks, leading to more accurate and contextually relevant documentation.

Contribution

It introduces a specialized dataset and evaluates two modeling paradigms, demonstrating that code metrics enhance documentation quality in automated systems.

Findings

01

Incorporating code metrics improves BLEU-1 by 6%.

02

Code metrics increase ROUGE-L F1 by 3%.

03

Metrics boost BERTScore F1 by 9%.

Abstract

Effective code documentation is essential for collaboration, comprehension, and long-term software maintainability, yet developers often neglect it due to its repetitive nature. Automated documentation generation has evolved from heuristic and rule-based methods to neural network-based and large language model (LLM)-based approaches. However, existing methods often overlook structural and quantitative characteristics of code that influence readability and comprehension. Prior research suggests that code metrics capture information relevant to program understanding. Building on these insights, this paper investigates the role of source code metrics as auxiliary signals for automated documentation generation, focusing on computational notebooks, a popular medium among data scientists that integrates code, narrative, and results but suffers from inconsistent documentation. We propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Teaching and Learning Programming · Scientific Computing and Data Management