Trustworthy Actionable Perturbations
Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon

TL;DR
This paper introduces Trustworthy Actionable Perturbations (TAP), a framework for generating input modifications that reliably change true class probabilities, ensuring meaningful real-world impact rather than adversarial deception.
Contribution
The paper proposes a novel TAP framework with verification, new cost and reward definitions, and theoretical analysis, advancing counterfactual methods for trustworthy decision-making.
Findings
Verification procedure PAC-learnability established
TAP effectively changes true class probabilities
Outperforms previous counterfactual methods
Abstract
Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a counterfactual changes a classifier's decision, however, it may not affect the true underlying class probabilities, i.e. the counterfactual may act like an adversarial attack and ``fool'' the classifier. We propose a new framework for creating modified inputs that change the true underlying probabilities in a beneficial way which we call Trustworthy Actionable Perturbations (TAP). This includes a novel verification procedure to ensure that TAP change the true class probabilities instead of acting adversarially. Our framework also includes new cost, reward, and goal definitions that are better suited to effectuating change in the real world. We present PAC-learnability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSystems Engineering Methodologies and Applications · Simulation Techniques and Applications
