Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps
Joran Leest, Ilias Gerostathopoulos, Patricia Lago, Claudia Raibulet

TL;DR
This paper investigates current practices in monitoring and observability of machine learning systems, revealing gaps and providing empirical insights into what practitioners track to ensure model reliability.
Contribution
It provides the first empirical analysis of ML observability practices through focus groups, identifying gaps and suggesting directions for improved tooling and research.
Findings
Practitioners systematically capture diverse observability information.
Current practices have notable gaps in monitoring and fault detection.
Insights inform future tooling and research directions.
Abstract
Production machine learning (ML) systems fail silently -- not with crashes, but through wrong decisions. While observability is recognized as critical for ML operations, there is a lack empirical evidence of what practitioners actually capture. This study presents empirical results on ML observability in practice through seven focus group sessions in several domains. We catalog the information practitioners systematically capture across ML systems and their environment and map how they use it to validate models, detect and diagnose faults, and explain observed degradations. Finally, we identify gaps in current practice and outline implications for tooling design and research to establish ML observability practices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
