TL;DR
This paper introduces an event-grounded interpretability pipeline for vision-language-action policies, linking sparse autoencoder features to behavioral events to improve understanding and causal analysis of robot actions.
Contribution
It proposes a novel event-grounded analysis method for SAE features in VLA policies, enhancing causal interpretability and transferability across architectures and real robots.
Findings
Event-grounded ranking yields strong causal effects on OpenVLA.
The approach transfers to continuous action chunks of π_{0.5}.
Usability varies with architecture and intervention site, revealing safety and interpretability limits.
Abstract
Vision-Language-Action (VLA) policies translate language and visual inputs into robot actions, where their hidden representations directly shape closed-loop behavior. However, mechanistic interpretability tools from language and vision-language models do not transfer cleanly to VLAs: outputs are robot actions rather than human-readable tokens, and interventions can only be tested via expensive closed-loop rollouts. We propose an event-grounded interpretability pipeline that anchors SAE feature analysis to behavioral events rather than text contexts. End-effector keyframes are clustered within each task using visual, state, and temporal cues, linking SAE features to behaviorally salient events and, via optional VLM annotations, to semantic context. To our knowledge, our pipeline is among the first to ground SAE-based VLA analysis in closed-loop behavioral events. Across two simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
