DORA: Exploring Outlier Representations in Deep Neural Networks
Kirill Bykov, Mayukh Deb, Dennis Grinwald, Klaus-Robert M\"uller,, Marina M.-C. H\"ohne

TL;DR
This paper introduces DORA, a data-agnostic framework for analyzing neural network representations, using the novel EA distance to identify outlier features related to spurious correlations in deep models.
Contribution
DORA is the first framework to analyze neural representations without data dependence, utilizing the EA metric to detect outlier concepts like artifacts and spurious correlations.
Findings
EA metric effectively identifies outlier representations.
Outlier representations often correspond to spurious or undesired concepts.
Framework validated on real-world computer vision models.
Abstract
Deep Neural Networks (DNNs) excel at learning complex abstractions within their internal representations. However, the concepts they learn remain opaque, a problem that becomes particularly acute when models unintentionally learn spurious correlations. In this work, we present DORA (Data-agnOstic Representation Analysis), the first data-agnostic framework for analyzing the representational space of DNNs. Central to our framework is the proposed Extreme-Activation (EA) distance measure, which assesses similarities between representations by analyzing their activation patterns on data points that cause the highest level of activation. As spurious correlations often manifest in features of data that are anomalous to the desired task, such as watermarks or artifacts, we demonstrate that internal representations capable of detecting such artifactual concepts can be found by analyzing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
