DORA: Exploring Outlier Representations in Deep Neural Networks

Kirill Bykov; Mayukh Deb; Dennis Grinwald; Klaus-Robert M\"uller,; Marina M.-C. H\"ohne

arXiv:2206.04530·cs.LG·July 11, 2023·6 cites

DORA: Exploring Outlier Representations in Deep Neural Networks

Kirill Bykov, Mayukh Deb, Dennis Grinwald, Klaus-Robert M\"uller,, Marina M.-C. H\"ohne

PDF

Open Access 1 Repo

TL;DR

This paper introduces DORA, a data-agnostic framework for analyzing neural network representations, using the novel EA distance to identify outlier features related to spurious correlations in deep models.

Contribution

DORA is the first framework to analyze neural representations without data dependence, utilizing the EA metric to detect outlier concepts like artifacts and spurious correlations.

Findings

01

EA metric effectively identifies outlier representations.

02

Outlier representations often correspond to spurious or undesired concepts.

03

Framework validated on real-world computer vision models.

Abstract

Deep Neural Networks (DNNs) excel at learning complex abstractions within their internal representations. However, the concepts they learn remain opaque, a problem that becomes particularly acute when models unintentionally learn spurious correlations. In this work, we present DORA (Data-agnOstic Representation Analysis), the first data-agnostic framework for analyzing the representational space of DNNs. Central to our framework is the proposed Extreme-Activation (EA) distance measure, which assesses similarities between representations by analyzing their activation patterns on data points that cause the highest level of activation. As spurious correlations often manifest in features of data that are anomalous to the desired task, such as watermarks or artifacts, we demonstrate that internal representations capable of detecting such artifactual concepts can be found by analyzing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lapalap/dora
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning