Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang, Saeed Khorram, Li Fuxin

TL;DR
This paper introduces methodologies to compare the decision-making processes of Transformers and CNNs in visual recognition, revealing differences in compositionality, the impact of normalization, and feature-sharing patterns.
Contribution
It proposes systematic explanation-based methods to analyze and compare the decision mechanisms of different visual backbones, highlighting the role of normalization and feature similarity.
Findings
Transformers and ConvNeXt are more compositional than CNNs.
Batch normalization reduces model compositionality.
Feature-sharing analysis reveals similarities among different backbones.
Abstract
In order to gain insights about the decision-making of different visual recognition backbones, we propose two methodologies, sub-explanation counting and cross-testing, that systematically applies deep explanation algorithms on a dataset-wide basis, and compares the statistics generated from the amount and nature of the explanations. These methodologies reveal the difference among networks in terms of two properties called compositionality and disjunctivism. Transformers and ConvNeXt are found to be more compositional, in the sense that they jointly consider multiple parts of the image in building their decisions, whereas traditional CNNs and distilled transformers are less compositional and more disjunctive, which means that they use multiple diverse but smaller set of parts to achieve a confident prediction. Through further experiments, we pinpointed the choice of normalization to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsLayer Normalization · Batch Normalization · ConvNeXt
