Embodied Interpretability: Linking Causal Understanding to Generalization in Vision-Language-Action Models
Hanxin Zhang, Mingshuo Xu, Abdulqader Dhafer, Shigang Yue, Hongbiao Dong, Zhou Daniel Hao

TL;DR
This paper introduces interventional attribution methods, ISS and NMR, to diagnose causal misalignment in vision-language-action models, improving interpretability and generalization analysis.
Contribution
It proposes the ISS and NMR metrics for causal attribution, providing unbiased estimates and better explanations than existing methods.
Findings
NMR predicts generalization behavior effectively.
ISS provides more faithful explanations than existing interpretability methods.
Interventional attribution helps identify causal misalignment in embodied policies.
Abstract
Vision-Language-Action (VLA) policies often fail under distribution shift, suggesting that decisions may depend on spurious visual correlations rather than task-relevant causes. We formulate visual-action attribution as an interventional estimation problem. Accordingly, we introduce the Interventional Significance Score (ISS), an interventional masking procedure for estimating the causal influence of visual regions on action predictions, and the Nuisance Mass Ratio (NMR), a scalar measure of attribution to task-irrelevant features. We analyze the statistical properties of ISS and show that it admits unbiased estimation, and we characterize conditions under which action prediction error provides a valid proxy for causal influence. Experiments across diverse manipulation tasks indicate that NMR predicts generalization behavior and that ISS yields more faithful explanations than existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
