CA-Stream: Attention-based pooling for interpretable image recognition
Felipe Torres, Hanwei Zhang, Ronan Sicre, St\'ephane Ayache, Yannis, Avrithis

TL;DR
CA-Stream introduces an attention-based pooling mechanism that improves interpretability in image recognition models by replacing traditional pooling methods while maintaining high recognition accuracy.
Contribution
The paper proposes CA-Stream, a novel cross-attention pooling method that enhances interpretability without sacrificing recognition performance.
Findings
CA-Stream achieves comparable accuracy to standard pooling methods.
It provides more interpretable attention-based saliency maps.
The method is effective across different network depths.
Abstract
Explanations obtained from transformer-based architectures in the form of raw attention, can be seen as a class-agnostic saliency map. Additionally, attention-based pooling serves as a form of masking the in feature space. Motivated by this observation, we design an attention-based pooling mechanism intended to replace Global Average Pooling (GAP) at inference. This mechanism, called Cross-Attention Stream (CA-Stream), comprises a stream of cross attention blocks interacting with features at different network depths. CA-Stream enhances interpretability in models, while preserving recognition performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsAverage Pooling · Global Average Pooling
