Code Smells for Machine Learning Applications

Haiyin Zhang; Lu\'is Cruz; Arie van Deursen

arXiv:2203.13746·cs.SE·March 31, 2022

Code Smells for Machine Learning Applications

Haiyin Zhang, Lu\'is Cruz, Arie van Deursen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a catalog of 22 machine learning-specific code smells, providing descriptions, potential issues, and solutions to improve code quality in ML applications.

Contribution

It is the first comprehensive catalog of ML-specific code smells, linking them to pipeline stages and offering guidance for better code quality.

Findings

01

Identified 22 ML-specific code smells from diverse sources.

02

Linked code smells to specific pipeline stages and long-term issues.

03

Provided descriptions and solutions for each identified smell.

Abstract

The popularity of machine learning has wildly expanded in recent years. Machine learning techniques have been heatedly studied in academia and applied in the industry to create business value. However, there is a lack of guidelines for code quality in machine learning applications. In particular, code smells have rarely been studied in this domain. Although machine learning code is usually integrated as a small part of an overarching system, it usually plays an important role in its core functionality. Hence ensuring code quality is quintessential to avoid issues in the long run. This paper proposes and identifies a list of 22 machine learning-specific code smells collected from various sources, including papers, grey literature, GitHub commits, and Stack Overflow posts. We pinpoint each smell with a description of its context, potential issues in the long run, and proposed solutions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hynn01/ml-smells
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Advanced Malware Detection Techniques