Explainability and Robustness of Deep Visual Classification Models
Jindong Gu

TL;DR
This paper investigates the explainability and robustness of deep visual classification models, including CNNs, CapsNets, and ViTs, focusing on their core components to address limitations in interpretability and adversarial vulnerability.
Contribution
The study provides a comparative analysis of the core modules of CNNs, CapsNets, and ViTs to enhance understanding of their explainability and robustness.
Findings
CapsNets and ViTs exhibit different vulnerabilities to adversarial attacks.
Core modules like dynamic routing and self-attention influence model robustness.
Insights into model explainability can guide the development of more transparent models.
Abstract
In the computer vision community, Convolutional Neural Networks (CNNs), first proposed in the 1980's, have become the standard visual classification model. Recently, as alternatives to CNNs, Capsule Networks (CapsNets) and Vision Transformers (ViTs) have been proposed. CapsNets, which were inspired by the information processing of the human brain, are considered to have more inductive bias than CNNs, whereas ViTs are considered to have less inductive bias than CNNs. All three classification models have received great attention since they can serve as backbones for various downstream tasks. However, these models are far from being perfect. As pointed out by the community, there are two weaknesses in standard Deep Neural Networks (DNNs). One of the limitations of DNNs is the lack of explainability. Even though they can achieve or surpass human expert performance in the image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
