Prioritizing documentation effort: Can we do better?
Shiran Liu, Zhaoqiang Guo, Yanhui Li, Hongmin Lu, Lin Chen, Lei Xu,, Yuming Zhou, Baowen Xu

TL;DR
This paper introduces an unsupervised PageRank-based method to prioritize software modules for documentation, outperforming previous supervised neural network approaches and eliminating the need for labeled training data.
Contribution
The paper proposes a novel unsupervised PageRank method for documentation prioritization, demonstrating its effectiveness over supervised neural network models on large datasets.
Findings
PageRank outperforms ANN in prioritization accuracy
The approach requires no training data
Effective on large, real-world datasets
Abstract
Code documentations are essential for software quality assurance, but due to time or economic pressures, code developers are often unable to write documents for all modules in a project. Recently, a supervised artificial neural network (ANN) approach is proposed to prioritize important modules for documentation effort. However, as a supervised approach, there is a need to use labeled training data to train the prediction model, which may not be easy to obtain in practice. Furthermore, it is unclear whether the ANN approach is generalizable, as it is only evaluated on several small data sets. In this paper, we propose an unsupervised approach based on PageRank to prioritize documentation effort. This approach identifies "important" modules only based on the dependence relationships between modules in a project. As a result, the PageRank approach does not need any training data to build…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Open Source Software Innovations
