Towards a consistent interpretation of AIOps models
Yingzhe Lyu, Gopi Krishnan Rajbahadur, Dayi Lin, Boyuan Chen, Zhen, Ming (Jack) Jiang

TL;DR
This paper investigates the consistency of AIOps model interpretations across different dimensions and provides guidelines to improve interpretability reliability in IT operations.
Contribution
It systematically studies interpretation consistency in AIOps models, highlighting factors affecting it and proposing best practices for practitioners.
Findings
Higher AUC models (>0.75) have more consistent interpretations.
Controlling randomness improves interpretation consistency.
Sliding Window and Full History approaches yield more consistent results.
Abstract
Artificial Intelligence for IT Operations (AIOps) has been adopted in organizations in various tasks, including interpreting models to identify indicators of service failures. To avoid misleading practitioners, AIOps model interpretations should be consistent (i.e., different AIOps models on the same task agree with one another on feature importance). However, many AIOps studies violate established practices in the machine learning community when deriving interpretations, such as interpreting models with suboptimal performance, though the impact of such violations on the interpretation consistency has not been studied. In this paper, we investigate the consistency of AIOps model interpretation along three dimensions: internal consistency, external consistency, and time consistency. We conduct a case study on two AIOps tasks: predicting Google cluster job failures, and Backblaze hard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
