Interpretable Computer Vision Models through Adversarial Training:   Unveiling the Robustness-Interpretability Connection

Delyan Boychev

arXiv:2307.02500·cs.CV·November 21, 2023

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

Delyan Boychev

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper investigates how adversarial training enhances the interpretability of computer vision models by making their learned features more meaningful and aligned with human understanding, while also improving robustness.

Contribution

It provides extensive empirical evidence linking adversarial robustness with increased interpretability in deep neural networks for vision tasks.

Findings

01

Robust models are less vulnerable to adversarial attacks.

02

Robust models learn features closer to real, human-understandable ones.

03

Standard models focus on less meaningful image regions.

Abstract

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

delyan-boychev/pytorch_trainers_interpretability
pytorchOfficial

Datasets

ilee0022/ImageNet-Subset150
dataset· 29 dl
29 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsFocus