Explanations that reveal all through the definition of encoding
Aahlad Puli, Nhi Nguyen, Rajesh Ranganath

TL;DR
This paper introduces a formal definition of encoding explanations, distinguishes them from non-encoding explanations, and develops a new evaluation method, STRIPE-X, to accurately rank explanations based on their predictive power, revealing that LLM explanations often encode information.
Contribution
The paper defines encoding explanations using conditional dependence, shows existing scores fail to distinguish them properly, and introduces STRIPE-X for correct ranking.
Findings
Existing scores do not rank non-encoding explanations above encoding ones.
STRIPE-X accurately ranks explanations based on their predictive power.
LLM-generated explanations for sentiment analysis often encode information despite being non-encoding.
Abstract
Feature attributions attempt to highlight what inputs drive predictive power. Good attributions or explanations are thus those that produce inputs that retain this predictive power; accordingly, evaluations of explanations score their quality of prediction. However, evaluations produce scores better than what appears possible from the values in the explanation for a class of explanations, called encoding explanations. Probing for encoding remains a challenge because there is no general characterization of what gives the extra predictive power. We develop a definition of encoding that identifies this extra predictive power via conditional dependence and show that the definition fits existing examples of encoding. This definition implies, in contrast to encoding explanations, that non-encoding explanations contain all the informative inputs used to produce the explanation, giving them a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
