Empirical Characterization of Logging Smells in Machine Learning Code
Patrick Loic Foalem, Leuson Da Silva, Foutse Khomh, Heng Li, Ettore Merlo

TL;DR
This study empirically analyzes logging smells in machine learning code, revealing their prevalence, types, and impact on system quality, and provides a dataset for future research.
Contribution
It introduces a taxonomy of ML-specific logging smells, analyzes 444 repositories, and validates the relevance of these smells through a practitioner survey.
Findings
Logging smells are widespread in ML systems.
Certain smells significantly impact reproducibility and maintainability.
A publicly available dataset supports future detection research.
Abstract
Logging plays a central role in ensuring reproducibility, observability, and reliability in machine learning (ML) systems. While logging is generally considered a good engineering practice, poorly designed logging can negatively affect experiment tracking, security, debugging, and system performance. In this paper, we present an empirical study of logging smells in ML projects and propose a taxonomy of ML-specific logging smell types. We conducted a large-scale analysis of 444 ML repositories and manually labeled 2,448 instances of logging smells. Based on this analysis, we identified 12 categories of logging smells spanning security, metric management, configuration, verbosity, and context-related issues. Our results show that logging smells are widespread in ML systems and vary in frequency and manifestation across projects. To assess practical relevance, we conducted a survey…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Machine Learning and Data Classification
