Code to Comment "Translation": Data, Metrics, Baselining & Evaluation
David Gros, Hariharan Sezhiyan, Prem Devanbu, Zhou Yu

TL;DR
This paper critically examines the use of translation models and metrics for code comment generation, comparing datasets and proposing baselines, highlighting the need for improved methods and evaluation standards.
Contribution
It analyzes code-comment datasets versus natural language translation datasets and evaluates the effectiveness of current models and metrics, proposing IR baselines and future research directions.
Findings
Code-comment datasets differ from natural language datasets like WMT19.
BLEU scores may require calibration for code comment tasks.
Simple IR methods serve as reasonable baselines for comment generation.
Abstract
The relationship of comments to code, and in particular, the task of generating useful comments given the code, has long been of interest. The earliest approaches have been based on strong syntactic theories of comment-structures, and relied on textual templates. More recently, researchers have applied deep learning methods to this task, and specifically, trainable generative translation models which are known to work very well for Natural Language translation (e.g., from German to English). We carefully examine the underlying assumption here: that the task of generating comments sufficiently resembles the task of translating between natural languages, and so similar models and evaluation metrics could be used. We analyze several recent code-comment datasets for this task: CodeNN, DeepCom, FunCom, and DocString. We compare them with WMT19, a standard dataset frequently used to train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
