The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders   to InceptionV1 Early Vision

Liv Gorton

arXiv:2406.03662·cs.LG·September 10, 2024·1 cites

The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision

Liv Gorton

PDF

Open Access

TL;DR

This paper applies sparse autoencoders to early vision layers of InceptionV1, revealing new interpretable features such as additional curve detectors and decomposing polysemantic neurons, enhancing understanding of CNNs.

Contribution

It demonstrates that sparse autoencoders can uncover new interpretable features and decompose polysemantic neurons in InceptionV1's early layers, advancing interpretability.

Findings

01

SAEs uncover new curve detectors in InceptionV1

02

SAEs decompose polysemantic neurons into simpler features

03

Enhanced interpretability of CNN features

Abstract

Recent work on sparse autoencoders (SAEs) has shown promise in extracting interpretable features from neural networks and addressing challenges with polysemantic neurons caused by superposition. In this paper, we apply SAEs to the early vision layers of InceptionV1, a well-studied convolutional neural network, with a focus on curve detectors. Our results demonstrate that SAEs can uncover new interpretable features not apparent from examining individual neurons, including additional curve detectors that fill in previous gaps. We also find that SAEs can decompose some polysemantic neurons into more monosemantic constituent features. These findings suggest SAEs are a valuable tool for understanding InceptionV1, and convolutional neural networks more generally.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications

MethodsFocus