TL;DR
This paper introduces semi-automated and automated methods for extracting subroutine summaries from unstructured code comments, addressing the challenge of generating documentation for large, unannotated legacy codebases.
Contribution
It proposes novel crowdsourcing and automation techniques for extracting subroutine summaries without requiring prior annotations.
Findings
Validated approaches through experiments
Provided cost estimates for large-scale annotation
Demonstrated effectiveness in unstructured comments
Abstract
Summary descriptions of subroutines are short (usually one-sentence) natural language explanations of a subroutine's behavior and purpose in a program. These summaries are ubiquitous in documentation, and many tools such as JavaDocs and Doxygen generate documentation built around them. And yet, extracting summaries from unstructured source code repositories remains a difficult research problem -- it is very difficult to generate clean structured documentation unless the summaries are annotated by programmers. This becomes a problem in large repositories of legacy code, since it is cost prohibitive to retroactively annotate summaries in dozens or hundreds of old programs. Likewise, it is a problem for creators of automatic documentation generation algorithms, since these algorithms usually must learn from large annotated datasets, which do not exist for many programming languages. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
