# When Causal Intervention Meets Adversarial Examples and Image Masking   for Deep Neural Networks

**Authors:** Chao-Han Huck Yang, Yi-Chieh Liu, Pin-Yu Chen, Xiaoli Ma, Yi-Chang, James Tsai

arXiv: 1902.03380 · 2021-10-11

## TL;DR

This paper introduces a causal inference framework for visual reasoning in deep neural networks, combining do-calculus with pixel-wise masking and adversarial perturbations to better understand and detect adversarial examples.

## Contribution

It presents a novel causal inference approach using do-calculus with pixel-level masking and adversarial perturbations for improved interpretability and adversarial detection in DNNs.

## Key findings

- CE is a robust index for understanding DNNs compared to CAMs.
- CE can distinguish adversarially perturbed images from normal images.
- Experimental results on Chest X-Ray-14 dataset validate the effectiveness of CE.

## Abstract

Discovering and exploiting the causality in deep neural networks (DNNs) are crucial challenges for understanding and reasoning causal effects (CE) on an explainable visual model. "Intervention" has been widely used for recognizing a causal relation ontologically. In this paper, we propose a causal inference framework for visual reasoning via do-calculus. To study the intervention effects on pixel-level features for causal reasoning, we introduce pixel-wise masking and adversarial perturbation. In our framework, CE is calculated using features in a latent space and perturbed prediction from a DNN-based model. We further provide the first look into the characteristics of discovered CE of adversarially perturbed images generated by gradient-based methods \footnote{~~https://github.com/jjaacckkyy63/Causal-Intervention-AE-wAdvImg}. Experimental results show that CE is a competitive and robust index for understanding DNNs when compared with conventional methods such as class-activation mappings (CAMs) on the Chest X-Ray-14 dataset for human-interpretable feature(s) (e.g., symptom) reasoning. Moreover, CE holds promises for detecting adversarial examples as it possesses distinct characteristics in the presence of adversarial perturbations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.03380/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1902.03380/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1902.03380/full.md

---
Source: https://tomesphere.com/paper/1902.03380