When Code Smells Meet ML: On the Lifecycle of ML-specific Code Smells in ML-enabled Systems
Gilberto Recupito, Giammaria Giordano, Filomena Ferrucci and, Dario Di Nucci, Fabio Palomba

TL;DR
This paper investigates the emergence, evolution, and impact of ML-specific code smells in ML-enabled systems through empirical analysis of a large dataset, aiming to improve quality assurance practices.
Contribution
It introduces a plan to empirically analyze ML-specific code smells, including their prevalence, lifecycle, and survivability, using a novel detection tool and extensive dataset.
Findings
ML smells are prevalent in real systems
Most ML smells are introduced early in development
Some ML smells persist despite refactoring
Abstract
Context. The adoption of Machine Learning (ML)--enabled systems is steadily increasing. Nevertheless, there is a shortage of ML-specific quality assurance approaches, possibly because of the limited knowledge of how quality-related concerns emerge and evolve in ML-enabled systems. Objective. We aim to investigate the emergence and evolution of specific types of quality-related concerns known as ML-specific code smells, i.e., sub-optimal implementation solutions applied on ML pipelines that may significantly decrease both the quality and maintainability of ML-enabled systems. More specifically, we present a plan to study ML-specific code smells by empirically analyzing (i) their prevalence in real ML-enabled systems, (ii) how they are introduced and removed, and (iii) their survivability. Method. We will conduct an exploratory study, mining a large dataset of ML-enabled systems and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Ferroelectric and Negative Capacitance Devices · Model-Driven Software Engineering Techniques
