Soft-CAM: Making black box models self-explainable for medical image analysis
Kerol Djoumessi, Philipp Berens

TL;DR
SoftCAM transforms CNNs into inherently interpretable models for medical image analysis by replacing the classification layer, enabling explicit class activation maps without sacrificing accuracy.
Contribution
This work introduces SoftCAM, a novel approach that makes CNNs self-explainable by architectural modifications, eliminating reliance on post-hoc explanation methods.
Findings
SoftCAM maintains classification accuracy on medical datasets.
It produces more reliable and interpretable class activation maps.
The approach outperforms existing post-hoc explanation methods.
Abstract
Convolutional neural networks (CNNs) are widely used for high-stakes applications like medicine, often surpassing human performance. However, most explanation methods rely on post-hoc attribution, approximating the decision-making process of already trained black-box models. These methods are often sensitive, unreliable, and fail to reflect true model reasoning, limiting their trustworthiness in critical applications. In this work, we introduce SoftCAM, a straightforward yet effective approach that makes standard CNN architectures inherently interpretable. By removing the global average pooling layer and replacing the fully connected classification layer with a convolution-based class evidence layer, SoftCAM preserves spatial information and produces explicit class activation maps that form the basis of the model's predictions. Evaluated on three medical datasets, SoftCAM maintains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
