Transparent Trade-offs between Properties of Explanations
Hiwot Belay Tadesse, Alihan H\"uy\"uk, Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez

TL;DR
This paper introduces a direct optimization method for explanations of black-box models, enabling better property control and trade-offs, overcoming limitations of previous approaches that merely encourage certain explanation properties.
Contribution
It proposes a novel direct optimization approach for explanations, allowing explicit control over property trade-offs and improving the consistency of desired explanation properties.
Findings
Direct optimization yields explanations with better property adherence.
Users can customize explanations for specific tasks.
The method outperforms encouraging-based approaches in property consistency.
Abstract
When explaining black-box machine learning models, it's often important for explanations to have certain desirable properties. Most existing methods `encourage' desirable properties in their construction of explanations. In this work, we demonstrate that these forms of encouragement do not consistently create explanations with the properties that are supposedly being targeted. Moreover, they do not allow for any control over which properties are prioritized when different properties are at odds with each other. We propose to directly optimize explanations for desired properties. Our direct approach not only produces explanations with optimal properties more consistently but also empowers users to control trade-offs between different properties, allowing them to create explanations with exactly what is needed for a particular task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Machine Learning in Materials Science
