TL;DR
This paper introduces a framework for explaining model adaptations over time using contrastive explanations, addressing the challenge of interpreting non-static AI systems and their changes.
Contribution
It proposes a novel framework and method for automatically identifying data regions affected by model updates, enhancing interpretability of evolving models.
Findings
Framework effectively explains model adaptations
Method accurately identifies affected data regions
Empirical evaluation demonstrates practical usefulness
Abstract
Many decision making systems deployed in the real world are not static - a phenomenon known as model adaptation takes place over time. The need for transparency and interpretability of AI-based decision models is widely accepted and thus have been worked on extensively. Usually, explanation methods assume a static system that has to be explained. Explaining non-static systems is still an open research question, which poses the challenge how to explain model adaptations. In this contribution, we propose and (empirically) evaluate a framework for explaining model adaptations by contrastive explanations. We also propose a method for automatically finding regions in data space that are affected by a given model adaptation and thus should be explained.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
