Towards Verified and Targeted Explanations through Formal Methods

Hanchen David Wang; Diego Manzanas Lopez; Preston K. Robinette; Ipek Oguz; Taylor T. Johnson; Meiyi Ma

arXiv:2604.14209·cs.LG·April 17, 2026

Towards Verified and Targeted Explanations through Formal Methods

Hanchen David Wang, Diego Manzanas Lopez, Preston K. Robinette, Ipek Oguz, Taylor T. Johnson, Meiyi Ma

PDF

TL;DR

ViTaX is a formal XAI framework that provides targeted, mathematically guaranteed explanations for neural network decisions, especially in safety-critical applications, by identifying sensitive features and certifying robustness against specific class transitions.

Contribution

It introduces ViTaX, the first method to generate formally guaranteed, targeted explanations with mathematical robustness guarantees for specific class transitions in neural networks.

Findings

01

Achieved over 30% fidelity improvement on multiple datasets.

02

Generated minimal feature subsets sensitive to class transitions.

03

Provided formal reachability analysis to certify robustness against targeted perturbations.

Abstract

As deep neural networks are deployed in safety-critical domains such as autonomous driving and medical diagnosis, stakeholders need explanations that are interpretable but also trustworthy with formal guarantees. Existing XAI methods fall short: heuristic attribution techniques (e.g., LIME, Integrated Gradients) highlight influential features but offer no mathematical guarantees about decision boundaries, while formal methods verify robustness yet remain untargeted, analyzing the nearest boundary regardless of whether it represents a critical risk. In safety-critical systems, not all misclassifications carry equal consequences; confusing a "Stop" sign for a "60 kph" sign is far more dangerous than confusing it with a "No Passing" sign. We introduce ViTaX (Verified and Targeted Explanations), a formal XAI framework that generates targeted semifactual explanations with mathematical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.