Linking Model Intervention to Causal Interpretation in Model Explanation
Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Kui Yu, Thuc Duy Le, Jixue, Liu

TL;DR
This paper investigates when model intervention effects can be interpreted causally, linking model explanations to causal inference, and highlights limitations in environments with unobserved features.
Contribution
It establishes theoretical conditions under which model intervention effects have causal meaning, connecting model explanations to causal interpretation.
Findings
Identifies conditions for causal interpretation of intervention effects
Demonstrates limitations with unobserved features
Validates theorems through semi-synthetic experiments
Abstract
Intervention intuition is often used in model explanation where the intervention effect of a feature on the outcome is quantified by the difference of a model prediction when the feature value is changed from the current value to the baseline value. Such a model intervention effect of a feature is inherently association. In this paper, we will study the conditions when an intuitive model intervention effect has a causal interpretation, i.e., when it indicates whether a feature is a direct cause of the outcome. This work links the model intervention effect to the causal interpretation of a model. Such an interpretation capability is important since it indicates whether a machine learning model is trustworthy to domain experts. The conditions also reveal the limitations of using a model intervention effect for causal interpretation in an environment with unobserved features. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Explainable Artificial Intelligence (XAI)
