Observing and Controlling Features in Vision-Language-Action Models
Hugo Buurmeijer, Carmen Amo Alonso, Aiden Swann, Marco Pavone

TL;DR
This paper introduces methods to observe and control features within Vision-Language-Action Models, enabling real-time, interpretable, and lightweight steering of robotic behaviors without the need for fine-tuning.
Contribution
It proposes the concepts of feature-observability and feature-controllability, providing techniques for linear observation and intervention in VLA internal representations.
Findings
Targeted linear interventions can steer robot behavior reliably.
VLAs have interpretable internal structures suitable for online adaptation.
Interventions preserve closed-loop capabilities during real-time control.
Abstract
Vision-Language-Action Models (VLAs) have shown remarkable progress towards embodied intelligence. While their architecture partially resembles that of Large Language Models (LLMs), VLAs exhibit higher complexity due to their multi-modal inputs/outputs and often hybrid nature of transformer and diffusion heads. This is part of the reason why insights from mechanistic interpretability in LLMs, which explain how the internal model representations relate to their output behavior, do not trivially transfer to VLA counterparts. In this work, we propose to close this gap by introducing and analyzing two main concepts: feature-observability and feature-controllability. In particular, we first study features that are linearly encoded in representation space, and show how they can be observed by means of a linear classifier. Then, we use a minimal linear intervention grounded in optimal control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Language and cultural evolution · Domain Adaptation and Few-Shot Learning
