Interpretable Convolutional Neural Networks
Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu

TL;DR
This paper introduces a method to transform traditional CNNs into interpretable models where high-layer filters correspond to object parts, enhancing understanding without requiring additional annotations.
Contribution
The proposed approach automatically assigns object part semantics to CNN filters during training, making the network's decision process more transparent without extra supervision.
Findings
Filters in interpretable CNNs are more semantically meaningful.
The method applies to various CNN architectures.
Enhanced interpretability aids understanding of CNN decision logic.
Abstract
This paper proposes a method to modify traditional convolutional neural networks (CNNs) into interpretable CNNs, in order to clarify knowledge representations in high conv-layers of CNNs. In an interpretable CNN, each filter in a high conv-layer represents a certain object part. We do not need any annotations of object parts or textures to supervise the learning process. Instead, the interpretable CNN automatically assigns each filter in a high conv-layer with an object part during the learning process. Our method can be applied to different types of CNNs with different structures. The clear knowledge representation in an interpretable CNN can help people understand the logics inside a CNN, i.e., based on which patterns the CNN makes the decision. Experiments showed that filters in an interpretable CNN were more semantically meaningful than those in traditional CNNs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Neural Networks and Applications
