A Unified Game-Theoretic Interpretation of Adversarial Robustness
Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen,, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang

TL;DR
This paper introduces a unified game-theoretic framework based on multi-order interactions to explain adversarial attacks and defenses in deep neural networks, revealing that attacks target high-order interactions while robustness stems from low-order interactions.
Contribution
It offers a novel unified interpretation of adversarial robustness and attacks through multi-order interactions, providing a principle-based explanation for existing defense methods.
Findings
Adversarial attacks mainly disrupt high-order interactions.
Robustness from adversarial training is linked to low-order interactions.
The framework unifies understanding of adversarial perturbations and defenses.
Abstract
This paper provides a unified view to explain different adversarial attacks and defense methods, \emph{i.e.} the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks
