Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box   Vulnerabilities

Julia Rosenzweig; Joachim Sicking; Sebastian Houben; Michael Mock,; Maram Akila

arXiv:2104.11691·cs.CV·April 26, 2021

Patch Shortcuts: Interpretable Proxy Models Efficiently Find Black-Box Vulnerabilities

Julia Rosenzweig, Joachim Sicking, Sebastian Houben, Michael Mock,, Maram Akila

PDF

TL;DR

This paper introduces a method using an interpretable proxy model, specifically a BagNet, to efficiently identify learned shortcuts in black-box neural networks, enhancing safety by uncovering vulnerabilities related to spurious correlations.

Contribution

The paper presents a novel approach employing an interpretable proxy model to detect shortcut-based vulnerabilities in black-box neural networks, which was previously challenging due to limited access.

Findings

01

Patch shortcuts significantly influence the black box model.

02

The proxy-based method effectively uncovers local image patch vulnerabilities.

03

Identified shortcuts can be transferred and validated in black-box models.

Abstract

An important pillar for safe machine learning (ML) is the systematic mitigation of weaknesses in neural networks to afford their deployment in critical applications. An ubiquitous class of safety risks are learned shortcuts, i.e. spurious correlations a network exploits for its decisions that have no semantic connection to the actual task. Networks relying on such shortcuts bear the risk of not generalizing well to unseen inputs. Explainability methods help to uncover such network vulnerabilities. However, many of these techniques are not directly applicable if access to the network is constrained, in so-called black-box setups. These setups are prevalent when using third-party ML components. To address this constraint, we present an approach to detect learned shortcuts using an interpretable-by-design network as a proxy to the black-box model of interest. Leveraging the proxy's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.