CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks
Rajdeep Singh Hundal, Yan Xiao, Jin Song Dong, Manuel Rigger

TL;DR
CAMAL is a scalable method that uses segmentation masks to enhance attention alignment and faithfulness in vision models, improving explainability and generalization without extra inference cost.
Contribution
It introduces CAMAL, a novel regularizer that leverages segmentation masks to improve attention quality in vision models across different learning paradigms.
Findings
CAMAL significantly improves attention alignment in all tested settings.
CAMAL enhances attention faithfulness by over 35% compared to recent methods.
Improved attention leads to better explainability and comparable or improved generalization.
Abstract
Many vision datasets now provide segmentation masks in addition to annotated images to support a wide range of tasks. In this work, we propose Class Activation Map Attention Learning (CAMAL), an efficient and scalable method that utilizes segmentation masks to improve attention alignment and faithfulness in vision models. Specifically, attention alignment refers to the degree to which a model's attention aligns with ground-truth discriminative regions, while attention faithfulness refers to the degree to which a model's attention influences its decision. Improving both attention alignment and faithfulness is essential for ensuring that model attention is both spatially accurate and causally meaningful. To improve attention alignment and faithfulness in vision models, CAMAL first extracts the model's attention for each image during training and then compares the attention to ground-truth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
