Attributional Robustness Training using Input-Gradient Spatial Alignment

Mayank Singh; Nupur Kumari; Puneet Mangla; Abhishek Sinha; Vineeth N; Balasubramanian; Balaji Krishnamurthy

arXiv:1911.13073·cs.CV·July 21, 2020

Attributional Robustness Training using Input-Gradient Spatial Alignment

Mayank Singh, Nupur Kumari, Puneet Mangla, Abhishek Sinha, Vineeth N, Balasubramanian, Balaji Krishnamurthy

PDF

1 Repo

TL;DR

This paper introduces a novel training method called ART that enhances the robustness of explanations in machine learning models by minimizing spatial correlation vulnerabilities, leading to improved attributional robustness and better object localization.

Contribution

The paper proposes a new attributional robustness training method using spatial alignment and triplet loss, achieving state-of-the-art robustness and localization performance.

Findings

01

Achieves 6-18% improvement in attributional robustness on standard datasets.

02

Demonstrates improved weakly supervised object localization performance.

03

Provides an upper bound for attributional vulnerability based on spatial correlation.

Abstract

Interpretability is an emerging area of research in trustworthy machine learning. Safe deployment of machine learning system mandates that the prediction and its explanation be reliable and robust. Recently, it has been shown that the explanations could be manipulated easily by adding visually imperceptible perturbations to the input while keeping the model's prediction intact. In this work, we study the problem of attributional robustness (i.e. models having robust explanations) by showing an upper bound for attributional vulnerability in terms of spatial correlation between the input image and its explanation map. We propose a training methodology that learns robust features by minimizing this upper bound using soft-margin triplet loss. Our methodology of robust attribution training (\textit{ART}) achieves the new state-of-the-art attributional robustness measure by a margin of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nupurkmr9/Attributional-Robustness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.