Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
Kabilan Elangovan, Daniel Ting

TL;DR
This paper introduces the C-Score, a new metric for evaluating the consistency of CAM-based explanations in medical image classification, emphasizing intra-class explanation reproducibility without needing annotations.
Contribution
The paper proposes the C-Score metric to assess explanation consistency, revealing mechanisms of explanation failure and providing early warnings of model instability.
Findings
C-Score detects explanation deterioration before AUC collapse.
ScoreCAM performance varies across architectures, informing deployment.
Identifies three mechanisms causing explanation inconsistency.
Abstract
Class Activation Mapping (CAM) methods are widely used to generate visual explanations for deep learning classifiers in medical imaging. However, existing evaluation frameworks assess whether explanations are correct, measured by localisation fidelity against radiologist annotations, rather than whether they are consistent: whether the model applies the same spatial reasoning strategy across different patients with the same pathology. We propose the C-Score (Consistency Score), a confidence-weighted, annotation-free metric that quantifies intra-class explanation reproducibility via intensity-emphasised pairwise soft IoU across correctly classified instances. We evaluate six CAM techniques: GradCAM, GradCAM++, LayerCAM, EigenCAM, ScoreCAM, and MS GradCAM++ across three CNN architectures (DenseNet201, InceptionV3, ResNet50V2) over thirty training epochs on the Kermany chest X-ray dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
