MLinter: Learning Coding Practices from Examples-Dream or Reality?
Corentin Latappy, Quentin Perez (Euromov DHM), Thomas Degueule,, Jean-R\'emy Falleri (IUF), Christelle Urtado (Euromov DHM), Sylvain Vauttier, (Euromov DHM), Xavier Blanc, C\'edric Teyton

TL;DR
This paper investigates whether coding practices can be automatically learned from developer-tagged examples using machine learning, specifically CodeBERT, highlighting challenges in real-world application due to data imbalance.
Contribution
It demonstrates the feasibility of learning coding practices with machine learning and analyzes the challenges faced when applying these models to real-world, unbalanced datasets.
Findings
High precision and recall on synthetic datasets
Severe precision drop on real-world unbalanced codebases
Recall remains high in real-world scenarios
Abstract
Coding practices are increasingly used by software companies. Their use promotes consistency, readability, and maintainability, which contribute to software quality. Coding practices were initially enforced by general-purpose linters, but companies now tend to design and adopt their own company-specific practices. However, these company-specific practices are often not automated, making it challenging to ensure they are shared and used by developers. Converting these practices into linter rules is a complex task that requires extensive static analysis and language engineering expertise. In this paper, we seek to answer the following question: can coding practices be learned automatically from examples manually tagged by developers? We conduct a feasibility study using CodeBERT, a state-of-the-art machine learning approach, to learn linter rules. Our results show that, although the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Advanced Malware Detection Techniques
