TL;DR
This paper introduces XPROAX, a novel method for generating local explanations for text classifiers by progressively approximating neighborhoods with counterfactuals and factuals, improving explanation quality and stability.
Contribution
It proposes a two-stage sampling approach using counterfactuals as landmarks to better generate neighborhoods for textual explanations, addressing high-dimensional challenges.
Findings
Outperforms competitors in usefulness and stability
Achieves better completeness, compactness, and correctness
Effective for real-world text classification datasets
Abstract
The importance of the neighborhood for training a local surrogate model to approximate the local decision boundary of a black box classifier has been already highlighted in the literature. Several attempts have been made to construct a better neighborhood for high dimensional data, like texts, by using generative autoencoders. However, existing approaches mainly generate neighbors by selecting purely at random from the latent space and struggle under the curse of dimensionality to learn a good local decision boundary. To overcome this problem, we propose a progressive approximation of the neighborhood using counterfactual instances as initial landmarks and a careful 2-stage sampling approach to refine counterfactuals and generate factuals in the neighborhood of the input instance to be explained. Our work focuses on textual data and our explanations consist of both word-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCounterfactuals Explanations
