Code Smells for Machine Learning Applications
Haiyin Zhang, Lu\'is Cruz, Arie van Deursen

TL;DR
This paper introduces a catalog of 22 machine learning-specific code smells, providing descriptions, potential issues, and solutions to improve code quality in ML applications.
Contribution
It is the first comprehensive catalog of ML-specific code smells, linking them to pipeline stages and offering guidance for better code quality.
Findings
Identified 22 ML-specific code smells from diverse sources.
Linked code smells to specific pipeline stages and long-term issues.
Provided descriptions and solutions for each identified smell.
Abstract
The popularity of machine learning has wildly expanded in recent years. Machine learning techniques have been heatedly studied in academia and applied in the industry to create business value. However, there is a lack of guidelines for code quality in machine learning applications. In particular, code smells have rarely been studied in this domain. Although machine learning code is usually integrated as a small part of an overarching system, it usually plays an important role in its core functionality. Hence ensuring code quality is quintessential to avoid issues in the long run. This paper proposes and identifies a list of 22 machine learning-specific code smells collected from various sources, including papers, grey literature, GitHub commits, and Stack Overflow posts. We pinpoint each smell with a description of its context, potential issues in the long run, and proposed solutions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Advanced Malware Detection Techniques
