Code quality assessment using transformers

Mosleh Mahamud; Isak Samsten

arXiv:2309.09264·cs.CL·September 19, 2023

Code quality assessment using transformers

Mosleh Mahamud, Isak Samsten

PDF

Open Access

TL;DR

This paper explores using transformer-based models, specifically CodeBERT, to automatically evaluate subjective code quality attributes like readability and maintainability in Java code, which are difficult for traditional automated methods.

Contribution

It demonstrates that transformer models with task-specific pre-training can effectively predict code quality, outperforming other techniques on a new dataset.

Findings

01

Transformers can predict code quality to some extent.

02

Task-adapted pre-training improves model performance.

03

Saliency maps help interpret model predictions.

Abstract

Automatically evaluate the correctness of programming assignments is rather straightforward using unit and integration tests. However, programming tasks can be solved in multiple ways, many of which, although correct, are inelegant. For instance, excessive branching, poor naming or repetitiveness make the code hard to understand and maintain. These subjective qualities of code are hard to automatically assess using current techniques. In this work we investigate the use of CodeBERT to automatically assign quality score to Java code. We experiment with different models and training paradigms. We explore the accuracy of the models on a novel dataset for code quality assessment. Finally, we assess the quality of the predictions using saliency maps. We find that code quality to some extent is predictable and that transformer based models using task adapted pre-training can solve the task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Malware Detection Techniques

MethodsCodeBERT