Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps

Joran Leest; Ilias Gerostathopoulos; Patricia Lago; Claudia Raibulet

arXiv:2510.24142·cs.SE·October 29, 2025

Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps

Joran Leest, Ilias Gerostathopoulos, Patricia Lago, Claudia Raibulet

PDF

TL;DR

This paper investigates current practices in monitoring and observability of machine learning systems, revealing gaps and providing empirical insights into what practitioners track to ensure model reliability.

Contribution

It provides the first empirical analysis of ML observability practices through focus groups, identifying gaps and suggesting directions for improved tooling and research.

Findings

01

Practitioners systematically capture diverse observability information.

02

Current practices have notable gaps in monitoring and fault detection.

03

Insights inform future tooling and research directions.

Abstract

Production machine learning (ML) systems fail silently -- not with crashes, but through wrong decisions. While observability is recognized as critical for ML operations, there is a lack empirical evidence of what practitioners actually capture. This study presents empirical results on ML observability in practice through seven focus group sessions in several domains. We catalog the information practitioners systematically capture across ML systems and their environment and map how they use it to validate models, detect and diagnose faults, and explain observed degradations. Finally, we identify gaps in current practice and outline implications for tooling design and research to establish ML observability practices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.