TopoEvo: A Topology-Aware Self-Evolving Multi-Agent Framework for Root Cause Analysis in Microservices
Junle Wang, Xingchuang Liao, Wenjun Wu

TL;DR
TopoEvo is a novel topology-aware multi-agent framework that improves root cause analysis in microservices by integrating graph learning, structured reasoning, and self-evolving mechanisms to handle complex, noisy, and drifting data.
Contribution
It introduces MOMA for multimodal alignment, VQ for symptom tokenization, and a self-evolving mechanism, advancing topology-aware RCA with explainability and robustness.
Findings
Outperforms existing RCA methods on microservice datasets.
Effectively reduces modality redundancy and sparsity.
Maintains robustness under topology drift.
Abstract
Root cause analysis (RCA) in microservices is challenging due to (i) noisy and heterogeneous multimodal observability (metrics, logs, traces), (ii) cascading failure propagation that amplifies downstream symptoms, and (iii) non-stationary topology drift induced by autoscaling and rolling updates. Recent LLM-based RCA agents can generate tool-grounded explanations, yet they often remain topology-agnostic and suffer from \emph{symptom-amplification bias}, misattributing the root cause to salient downstream victims. We propose \textbf{TopoEvo}, a topology-aware self-evolving multi-agent framework that couples graph representation learning with structured, topology-constrained reasoning. TopoEvo first introduces \emph{Metric-orthogonal Multimodal Alignment} (MOMA), which decomposes metric embeddings into complementary subspaces and contrastively aligns logs and traces to reduce modality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
