Transferring Visual Explainability of Self-Explaining Models to Prediction-Only Models without Additional Training
Yuya Yoshikawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito

TL;DR
This paper introduces a method to transfer visual explanation capabilities from self-explaining models to prediction-only models across domains, enhancing interpretability without retraining or significant accuracy loss.
Contribution
It presents a task arithmetic framework enabling explanation transfer to prediction-only models without additional training, expanding interpretability in existing models.
Findings
Transfer of explanation is successful between related domains.
Explanation quality improves in target domain.
Classification accuracy remains largely unaffected.
Abstract
In image classification scenarios where both prediction and explanation efficiency are required, self-explaining models that perform both tasks in a single inference are effective. However, for users who already have prediction-only models, training a new self-explaining model from scratch imposes significant costs in terms of both labeling and computation. This study proposes a method to transfer the visual explanation capability of self-explaining models learned in a source domain to prediction-only models in a target domain based on a task arithmetic framework. Our self-explaining model comprises an architecture that extends Vision Transformer-based prediction-only models, enabling the proposed method to endow explanation capability to many trained prediction-only models without additional training. Experiments on various image classification datasets demonstrate that, except for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Data Visualization and Analytics
