CodeBERT-nt: code naturalness via CodeBERT

Ahmed Khanfir; Matthieu Jimenez; Mike Papadakis; Yves Le Traon

arXiv:2208.06042·cs.SE·August 15, 2022

CodeBERT-nt: code naturalness via CodeBERT

Ahmed Khanfir, Matthieu Jimenez, Mike Papadakis, Yves Le Traon

PDF

Open Access

TL;DR

This paper introduces CodeBERT-nt, a method that uses pre-trained language models to measure code naturalness by predicting masked tokens, improving bug prioritization over traditional methods.

Contribution

It proposes a novel approach leveraging pre-trained models for code naturalness estimation, addressing limitations of traditional statistical models and demonstrating improved bug prioritization.

Findings

01

CodeBERT-nt outperforms random and complexity-based ranking techniques.

02

It achieves comparable or slightly better results than n-gram models.

03

The approach effectively prioritizes buggy code lines based on naturalness.

Abstract

Much of software-engineering research relies on the naturalness of code, the fact that code, in small code snippets, is repetitive and can be predicted using statistical language models like n-gram. Although powerful, training such models on large code corpus is tedious, time-consuming and sensitive to code patterns (and practices) encountered during training. Consequently, these models are often trained on a small corpora and estimate the language naturalness that is relative to a specific style of programming or type of project. To overcome these issues, we propose using pre-trained language models to infer code naturalness. Pre-trained models are often built on big data, are easy to use in an out-of-the-box way and include powerful learning associations mechanisms. Our key idea is to quantify code naturalness through its predictability, by using state-of-the-art generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability