On the Evaluation of NLP-based Models for Software Engineering

Maliheh Izadi; Matin Nili Ahmadabadi

arXiv:2203.17166·cs.SE·April 1, 2022

On the Evaluation of NLP-based Models for Software Engineering

Maliheh Izadi, Matin Nili Ahmadabadi

PDF

1 Repo

TL;DR

This paper reviews how NLP-based models for Software Engineering are evaluated, revealing a lack of standardization and proposing the need for a consistent evaluation methodology to enable fair comparisons.

Contribution

It highlights the inconsistency in evaluation protocols for NLP models in SE and emphasizes the necessity of a standardized assessment framework.

Findings

01

Current evaluations are inconsistent and lack standardization.

02

Metrics are often custom-defined and case-specific.

03

No widely-accepted evaluation protocol exists in the community.

Abstract

NLP-based models have been increasingly incorporated to address SE problems. These models are either employed in the SE domain with little to no change, or they are greatly tailored to source code and its unique characteristics. Many of these approaches are considered to be outperforming or complementing existing solutions. However, an important question arises here: "Are these models evaluated fairly and consistently in the SE community?". To answer this question, we reviewed how NLP-based models for SE problems are being evaluated by researchers. The findings indicate that currently there is no consistent and widely-accepted protocol for the evaluation of these models. While different aspects of the same task are being assessed in different studies, metrics are defined based on custom choices, rather than a system, and finally, answers are collected and interpreted case by case.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MalihehIzadi/nlp4se_eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.