Unsupervised learning of object semantic parts from internal states of CNNs by population encoding
Jianyu Wang, Zhishuai Zhang, Cihang Xie, Vittal Premachandran, Alan, Yuille

TL;DR
This paper introduces an unsupervised method to discover semantic object parts from CNN internal states, revealing that populations of neurons encode meaningful parts and enabling effective part detection.
Contribution
The work proposes a clustering-based approach to extract Visual Concepts that represent semantic parts from CNNs, providing new insights into internal representations and a comprehensive dataset for part annotation.
Findings
Visual concepts are semantically and visually coherent.
Visual concepts can serve as effective part detectors.
The method covers full object parts, not just sparse keypoints.
Abstract
We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification. This work provides a new unsupervised method to learn semantic parts and gives new understanding of the internal representations of CNNs. Our technique is based on the hypothesis that semantic parts are represented by populations of neurons rather than by single filters. We propose a clustering technique to extract part representations, which we call Visual Concepts. We show that visual concepts are semantically coherent in that they represent semantic parts, and visually coherent in that corresponding image patches appear very similar. Also, visual concepts provide full spatial coverage of the parts of an object, rather than a few sparse parts as is typically found in keypoint annotations. Furthermore, We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
