Interpreting Adversarial Examples with Attributes

Sadaf Gulshad; Jan Hendrik Metzen; Arnold Smeulders; Zeynep Akata

arXiv:1904.08279·cs.CV·April 18, 2019·6 cites

Interpreting Adversarial Examples with Attributes

Sadaf Gulshad, Jan Hendrik Metzen, Arnold Smeulders, Zeynep Akata

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method for interpreting adversarial examples in deep vision models by using attributes to justify decisions and analyze robustness, providing insights into model behavior under attack.

Contribution

It proposes a novel attribute-based approach to interpret and analyze adversarial examples, enabling black-box models to justify their decisions with visual attributes.

Findings

01

Attributes can effectively explain model decisions on clean and adversarial images.

02

Attribute relevance ranking correlates with decision changes under perturbations.

03

The approach improves understanding of model robustness and decision-making processes.

Abstract

Deep computer vision systems being vulnerable to imperceptible and carefully crafted noise have raised questions regarding the robustness of their decisions. We take a step back and approach this problem from an orthogonal direction. We propose to enable black-box neural networks to justify their reasoning both for clean and for adversarial examples by leveraging attributes, i.e. visually discriminative properties of objects. We rank attributes based on their class relevance, i.e. how the classification decision changes when the input is visually slightly perturbed, as well as image relevance, i.e. how well the attributes can be localized on both clean and perturbed images. We present comprehensive experiments for attribute prediction, adversarial example generation, adversarially robust learning, and their qualitative and quantitative analysis using predicted attributes on three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sadafgulshad1/Understaning-Misclassifications-by-Attributes
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Explainable Artificial Intelligence (XAI)