Domain-Adversarial Transfer Learning for Fault Root Cause Identification in Cloud Computing Systems

Bruce Fang; Danyi Gao

arXiv:2507.02233·cs.DC·July 4, 2025

Domain-Adversarial Transfer Learning for Fault Root Cause Identification in Cloud Computing Systems

Bruce Fang, Danyi Gao

PDF

TL;DR

This paper presents a transfer learning-based algorithm with domain adversarial mechanisms for fault root cause identification in cloud systems, improving accuracy and robustness under challenging conditions like class imbalance and structural differences.

Contribution

Introduces a novel transfer learning approach with shared feature extraction and domain adversarial training for fault diagnosis in complex cloud environments.

Findings

01

Outperforms existing methods in accuracy, F1-Score, and AUC.

02

Maintains high performance under class imbalance and structural heterogeneity.

03

Enhances robustness and generalization in real-world cloud scenarios.

Abstract

This paper addresses the challenge of fault root cause identification in cloud computing environments. The difficulty arises from complex system structures, dense service coupling, and limited fault information. To solve this problem, an intelligent identification algorithm based on transfer learning is proposed. The method introduces a shared feature extraction module and a domain adversarial mechanism to enable effective knowledge transfer from the source domain to the target domain. This improves the model's discriminative ability and generalization performance in the target domain. The model incorporates a pseudo-label selection strategy. When labeled samples are lacking in the target domain, high-confidence predictions are used in training. This enhances the model's ability to recognize minority classes. To evaluate the stability and adaptability of the method in real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.