Code Linting using Language Models

Darren Holden; Nafiseh Kahani

arXiv:2406.19508·cs.SE·July 24, 2024·1 cites

Code Linting using Language Models

Darren Holden, Nafiseh Kahani

PDF

Open Access

TL;DR

This paper explores using large language models to create a versatile, language-independent code linter capable of detecting various issues with high accuracy, aiming to improve over traditional, language-specific linters.

Contribution

It introduces a novel approach of training language models for multi-issue detection in code, demonstrating high accuracy and versatility compared to conventional linters.

Findings

01

Achieved 84.9% accuracy in binary issue detection.

02

Achieved 83.6% accuracy in multi-issue classification.

03

Demonstrated the potential for language models to replace traditional linters.

Abstract

Code linters play a crucial role in developing high-quality software systems by detecting potential problems (e.g., memory leaks) in the source code of systems. Despite their benefits, code linters are often language-specific, focused on certain types of issues, and prone to false positives in the interest of speed. This paper investigates whether large language models can be used to develop a more versatile code linter. Such a linter is expected to be language-independent, cover a variety of issue types, and maintain high speed. To achieve this, we collected a large dataset of code snippets and their associated issues. We then selected a language model and trained two classifiers based on the collected datasets. The first is a binary classifier that detects if the code has issues, and the second is a multi-label classifier that identifies the types of issues. Through extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Digital Communication and Language · Natural Language Processing Techniques