TL;DR
This paper introduces network dissection, a framework for interpreting individual units in deep neural networks, revealing their semantic roles in classification, generation, and adversarial contexts.
Contribution
The work provides a systematic method to identify and analyze the semantic meaning of hidden units in CNNs and GANs, advancing interpretability of deep models.
Findings
Units correspond to object concepts in CNNs.
Object manipulation affects scene generation in GANs.
Framework aids understanding of adversarial attacks and image editing.
Abstract
Deep neural networks excel at finding hierarchical representations that solve complex tasks over large data sets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. We find evidence that the network has learned many object classes that play crucial roles in classifying scene classes. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes. By analyzing changes made when small sets of units are activated or deactivated, we find that objects can be added and removed from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
