TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data
Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang,, Bing Li

TL;DR
TVDiag is a novel multimodal failure diagnosis framework for microservice systems that uses task-oriented and view-invariant learning to improve accuracy in locating failures and identifying failure types.
Contribution
It introduces a task-oriented, view-invariant multimodal failure diagnosis framework with contrastive learning and graph data augmentation, addressing limitations of previous methods.
Findings
Outperforms state-of-the-art in failure diagnosis accuracy.
Achieves at least 55.94% higher HR@1 accuracy.
Increases F1-score by over 4.08%.
Abstract
Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information. Several failure diagnosis methods have been recently proposed to integrate multimodal data based on deep learning. These methods, however, tend to combine modalities indiscriminately and treat them equally in failure diagnosis, ignoring the relationship between specific modalities and different diagnostic tasks. This oversight hinders the effective utilization of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability
