CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks

Rajdeep Singh Hundal; Yan Xiao; Jin Song Dong; Manuel Rigger

arXiv:2605.08325·eess.IV·May 12, 2026

CAMAL: Improving Attention Alignment and Faithfulness with Segmentation Masks

Rajdeep Singh Hundal, Yan Xiao, Jin Song Dong, Manuel Rigger

PDF

TL;DR

CAMAL is a scalable method that uses segmentation masks to enhance attention alignment and faithfulness in vision models, improving explainability and generalization without extra inference cost.

Contribution

It introduces CAMAL, a novel regularizer that leverages segmentation masks to improve attention quality in vision models across different learning paradigms.

Findings

01

CAMAL significantly improves attention alignment in all tested settings.

02

CAMAL enhances attention faithfulness by over 35% compared to recent methods.

03

Improved attention leads to better explainability and comparable or improved generalization.

Abstract

Many vision datasets now provide segmentation masks in addition to annotated images to support a wide range of tasks. In this work, we propose Class Activation Map Attention Learning (CAMAL), an efficient and scalable method that utilizes segmentation masks to improve attention alignment and faithfulness in vision models. Specifically, attention alignment refers to the degree to which a model's attention aligns with ground-truth discriminative regions, while attention faithfulness refers to the degree to which a model's attention influences its decision. Improving both attention alignment and faithfulness is essential for ensuring that model attention is both spatially accurate and causally meaningful. To improve attention alignment and faithfulness in vision models, CAMAL first extracts the model's attention for each image during training and then compares the attention to ground-truth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.