Causal Interpretability for Adversarial Robustness: A Hybrid Generative Classification Approach
Chunheng Zhao, Pierluigi Pisu, Gurcan Comert, Negash Begashaw, Varghese Vaidyan, Nina Christine Hubig

TL;DR
This paper introduces a hybrid generative-discriminative ensemble model that enhances adversarial robustness and interpretability in image classification without relying on adversarial training, validated on multiple datasets.
Contribution
The paper proposes a novel deep ensemble combining discriminative and generative models to improve robustness and interpretability against adversarial attacks.
Findings
Achieves superior robustness against white-box attacks on CIFAR datasets.
Establishes a correlation between interpretability and adversarial robustness.
Demonstrates scalability to complex datasets like Tiny-ImageNet.
Abstract
Deep learning-based discriminative classifiers, despite their remarkable success, remain vulnerable to adversarial examples that can mislead model predictions. While adversarial training can enhance robustness, it fails to address the intrinsic vulnerability stemming from the opaque nature of these black-box models. We present a deep ensemble model that combines discriminative features with generative models to achieve both high accuracy and adversarial robustness. Our approach integrates a bottom-level pre-trained discriminative network for feature extraction with a top-level generative classification network that models adversarial input distributions through a deep latent variable model. Using variational Bayes, our model achieves superior robustness against white-box adversarial attacks without adversarial training. Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis
