When Fine-Tuning Changes the Evidence: Architecture-Dependent Semantic Drift in Chest X-Ray Explanations
Kabilan Elangovan, Daniel Ting

TL;DR
This study investigates how fine-tuning affects the visual explanations of chest X-ray classifiers, revealing architecture-dependent shifts in attribution structures despite stable accuracy.
Contribution
It introduces the concept of semantic drift in explanations and demonstrates its dependence on architecture, training phase, and attribution method in medical imaging models.
Findings
Coarse anatomical localization remains stable across architectures.
Overlap IoU shows architecture-dependent reorganization of evidential structure.
Explanation stability can reverse between different attribution methods.
Abstract
Transfer learning followed by fine-tuning is widely adopted in medical image classification due to consistent gains in diagnostic performance. However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions. We define semantic drift as systematic changes in the attribution structure supporting a model's predictions between transfer learning and full fine-tuning, reflecting potential shifts in underlying visual reasoning despite stable classification performance. Using a five-class chest X-ray task, we evaluate DenseNet201, ResNet50V2, and InceptionV3 under a two-stage training protocol and quantify drift with reference-free metrics capturing spatial localization and structural consistency of attribution maps. Across architectures, coarse anatomical localization remains stable, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
