A Unified Game-Theoretic Interpretation of Adversarial Robustness

Jie Ren; Die Zhang; Yisen Wang; Lu Chen; Zhanpeng Zhou; Yiting Chen,; Xu Cheng; Xin Wang; Meng Zhou; Jie Shi; Quanshi Zhang

arXiv:2111.03536·cs.LG·November 9, 2021

A Unified Game-Theoretic Interpretation of Adversarial Robustness

Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen,, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi, Quanshi Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified game-theoretic framework based on multi-order interactions to explain adversarial attacks and defenses in deep neural networks, revealing that attacks target high-order interactions while robustness stems from low-order interactions.

Contribution

It offers a novel unified interpretation of adversarial robustness and attacks through multi-order interactions, providing a principle-based explanation for existing defense methods.

Findings

01

Adversarial attacks mainly disrupt high-order interactions.

02

Robustness from adversarial training is linked to low-order interactions.

03

The framework unifies understanding of adversarial perturbations and defenses.

Abstract

This paper provides a unified view to explain different adversarial attacks and defense methods, \emph{i.e.} the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jie-ren/a-unified-game-theoretic-interpretation-of-adversarial-robustness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks