Classifier Reconstruction Through Counterfactual-Aware Wasserstein Prototypes
Xuan Zhao, Zhuo Cao, Arya Bangun, Hanno Scharr, Ira Assent

TL;DR
This paper introduces a novel method for model reconstruction that leverages counterfactual explanations and Wasserstein barycenters to improve surrogate model fidelity, especially in data-limited scenarios.
Contribution
It proposes integrating counterfactuals with original data using Wasserstein barycenters to better approximate class prototypes and reduce decision boundary shift.
Findings
Improved fidelity between surrogate and target models.
Enhanced class prototype approximation using counterfactuals.
Mitigated decision boundary shift in model reconstruction.
Abstract
Counterfactual explanations provide actionable insights by identifying minimal input changes required to achieve a desired model prediction. Beyond their interpretability benefits, counterfactuals can also be leveraged for model reconstruction, where a surrogate model is trained to replicate the behavior of a target model. In this work, we demonstrate that model reconstruction can be significantly improved by recognizing that counterfactuals, which typically lie close to the decision boundary, can serve as informative though less representative samples for both classes. This is particularly beneficial in settings with limited access to labeled data. We propose a method that integrates original data samples with counterfactuals to approximate class prototypes using the Wasserstein barycenter, thereby preserving the underlying distributional structure of each class. This approach enhances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
