Causal Interpretability for Adversarial Robustness: A Hybrid Generative Classification Approach

Chunheng Zhao; Pierluigi Pisu; Gurcan Comert; Negash Begashaw; Varghese Vaidyan; Nina Christine Hubig

arXiv:2412.20025·cs.CV·December 9, 2025

Causal Interpretability for Adversarial Robustness: A Hybrid Generative Classification Approach

Chunheng Zhao, Pierluigi Pisu, Gurcan Comert, Negash Begashaw, Varghese Vaidyan, Nina Christine Hubig

PDF

Open Access

TL;DR

This paper introduces a hybrid generative-discriminative ensemble model that enhances adversarial robustness and interpretability in image classification without relying on adversarial training, validated on multiple datasets.

Contribution

The paper proposes a novel deep ensemble combining discriminative and generative models to improve robustness and interpretability against adversarial attacks.

Findings

01

Achieves superior robustness against white-box attacks on CIFAR datasets.

02

Establishes a correlation between interpretability and adversarial robustness.

03

Demonstrates scalability to complex datasets like Tiny-ImageNet.

Abstract

Deep learning-based discriminative classifiers, despite their remarkable success, remain vulnerable to adversarial examples that can mislead model predictions. While adversarial training can enhance robustness, it fails to address the intrinsic vulnerability stemming from the opaque nature of these black-box models. We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness. Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions through a deep latent variable model. Using variational Bayes, our model achieves superior robustness against white-box adversarial attacks without adversarial training. Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis