Towards falsifiable interpretability research

Matthew L. Leavitt; Ari Morcos

arXiv:2010.12016·cs.CY·October 26, 2020·28 cites

Towards falsifiable interpretability research

Matthew L. Leavitt, Ari Morcos

PDF

Open Access

TL;DR

This paper critiques current interpretability methods for deep neural networks, highlighting their reliance on intuition and lack of falsifiability, and proposes a framework for more robust, evidence-based interpretability research.

Contribution

It introduces a framework for falsifiable interpretability research, encouraging hypothesis-driven methods to improve robustness and validity in understanding DNNs.

Findings

01

Current interpretability methods often rely on intuition.

02

Falsifiability can improve robustness of interpretability.

03

Proposed framework promotes evidence-based insights.

Abstract

Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpretability research suffers from an over-reliance on intuition-based approaches that risk-and in some cases have caused-illusory progress and misleading conclusions. We identify a set of limitations that we argue impede meaningful progress in interpretability research, and examine two popular classes of interpretability methods-saliency and single-neuron-based approaches-that serve as case studies for how overreliance on intuition and lack of falsifiability can undermine interpretability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsInterpretability