On the Evaluation of Neural Code Summarization

Ensheng Shi; Yanlin Wang; Lun Du; Junjie Chen; Shi Han; Hongyu Zhang,; Dongmei Zhang; Hongbin Sun

arXiv:2107.07112·cs.SE·February 14, 2022

On the Evaluation of Neural Code Summarization

Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang,, Dongmei Zhang, Hongbin Sun

PDF

1 Repo

TL;DR

This paper systematically analyzes neural code summarization models, highlighting the impact of evaluation metrics, pre-processing, and dataset characteristics, and provides guidelines and tools for more reliable future research.

Contribution

It offers an in-depth evaluation of current models, revealing overlooked factors affecting performance and proposing best practices and a toolbox for future research.

Findings

01

BLEU variants significantly influence evaluation results.

02

Pre-processing choices can alter performance by -18% to +25%.

03

Dataset characteristics impact model evaluation and ranking.

Abstract

Source code summaries are important for program comprehension and maintenance. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically generate summaries for given code snippets. To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets. The evaluation results show that some important factors have a great influence on the model evaluation, especially on the performance of models and the ranking among the models. However, these factors might be easily overlooked. Specifically, (1) the BLEU metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DeepSoftwareAnalytics/CodeSumEvaluation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.