DynaCausal: Dynamic Causality-Aware Root Cause Analysis for Distributed Microservices
Songhan Zhang, Aoyang Fang, Yifan Yang, Ruiyi Cheng, Xiaoying Tang, Pinjia He

TL;DR
DynaCausal is a novel framework that improves root cause analysis in dynamic microservice systems by modeling time-varying dependencies, reducing noise interference, and enhancing causal attribution for more accurate diagnoses.
Contribution
It introduces a dynamic causality-aware approach that unifies multi-modal signals, employs contrastive learning, and optimizes causal ranking to address limitations of existing RCA methods.
Findings
Achieves an average AC@1 of 0.63, outperforming state-of-the-art methods.
Effectively captures dynamic service dependencies and fault propagation.
Provides accurate and interpretable root cause diagnoses in complex environments.
Abstract
Cloud-native microservices enable rapid iteration and scalable deployment but also create complex, fast-evolving dependencies that challenge reliable diagnosis. Existing root cause analysis (RCA) approaches, even with multi-modal fusion of logs, traces, and metrics, remain limited in capturing dynamic behaviors and shifting service relationships. Three critical challenges persist: (i) inadequate modeling of cascading fault propagation, (ii) vulnerability to noise interference and concept drift in normal service behavior, and (iii) over-reliance on service deviation intensity that obscures true root causes. To address these challenges, we propose DynaCausal, a dynamic causality-aware framework for RCA in distributed microservice systems. DynaCausal unifies multi-modal dynamic signals to capture time-varying spatio-temporal dependencies through interaction-aware representation learning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
