Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI

Tien Dat Hoang

arXiv:2511.21291·cs.CR·November 27, 2025

Illuminating the Black Box: Real-Time Monitoring of Backdoor Unlearning in CNNs via Explainable AI

Tien Dat Hoang

PDF

Open Access

TL;DR

This paper presents a real-time, explainable framework for monitoring and unlearning backdoor attacks in CNNs, combining Grad-CAM and a novel metric to enhance transparency and effectiveness.

Contribution

It introduces a new explainability-integrated unlearning method with a quantitative attention shift metric, improving backdoor removal transparency and performance.

Findings

01

Reduces attack success rate from 96.51% to 5.52%.

02

Retains 99.48% of clean accuracy.

03

Provides real-time interpretability during unlearning.

Abstract

Backdoor attacks pose severe security threats to deep neural networks by embedding malicious triggers that force misclassification. While machine unlearning techniques can remove backdoor behaviors, current methods lack transparency and real-time interpretability. This paper introduces a novel framework that integrates Gradient-weighted Class Activation Mapping (Grad-CAM) into the unlearning process to provide real-time monitoring and explainability. We propose the Trigger Attention Ratio (TAR) metric to quantitatively measure the model's attention shift from trigger patterns to legitimate object features. Our balanced unlearning strategy combines gradient ascent on backdoor samples, Elastic Weight Consolidation (EWC) for catastrophic forgetting prevention, and a recovery phase for clean accuracy restoration. Experiments on CIFAR-10 with BadNets attacks demonstrate that our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications