From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
Geraldin Nanfack, Michael Eickenberg, Eugene Belilovsky

TL;DR
This paper investigates the robustness of visual circuits in deep neural networks against adversarial manipulations, introducing new attacks that reveal their vulnerability and analyzing their stability in the context of feature visualization techniques.
Contribution
It proposes a novel attack called ProxPulse that manipulates feature visualizations and visual circuits, revealing their susceptibility to adversarial attacks.
Findings
Visual circuits show robustness to ProxPulse attack.
New attack methods effectively manipulate feature visualizations.
Visual circuits are more manipulable than previously thought.
Abstract
Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applications. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework. This paper starts by addressing limitations in existing works by proposing a novel attack called ProxPulse that simultaneously manipulates the two types of feature visualizations. Surprisingly, when analyzing these attacks under the umbrella of visual circuits, we find that visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Cell Image Analysis Techniques
