Automatic Discovery of Visual Circuits

Achyuta Rajaram; Neil Chowdhury; Antonio Torralba; Jacob Andreas,; Sarah Schwettmann

arXiv:2404.14349·cs.CV·April 23, 2024

Automatic Discovery of Visual Circuits

Achyuta Rajaram, Neil Chowdhury, Antonio Torralba, Jacob Andreas,, Sarah Schwettmann

PDF

Open Access 1 Repo

TL;DR

This paper presents a scalable method for automatically discovering interpretable subgraphs within deep vision models that correspond to specific visual concepts, enabling better understanding and robustness.

Contribution

It introduces a novel approach to identify and manipulate neural circuits related to visual concepts using minimal examples and functional connectivity analysis.

Findings

01

Extracted circuits causally influence model output

02

Editing circuits improves model robustness against adversarial attacks

03

Method scales to large pretrained models

Abstract

To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We introduce a new method for identifying these subgraphs: specifying a visual concept using a few examples, and then tracing the interdependence of neuron activations across layers, or their functional connectivity. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

multimodal-interpretability/visual-circuits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Industrial Vision Systems and Defect Detection · Blind Source Separation Techniques