VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models
Ravi Ranjan, Agoritsa Polyzou

TL;DR
VLA-Forget is a hybrid unlearning framework for embodied vision-language-action models that effectively removes undesirable behaviors while preserving perception and reasoning capabilities.
Contribution
It introduces a staged, multi-objective unlearning method that jointly optimizes for forgetting, perception preservation, and reasoning retention in embodied models.
Findings
Improves forgetting efficacy by 10%
Preserves perceptual specificity by 22%
Retains reasoning and task success by 9%
Abstract
Vision-language-action (VLA) models are emerging as embodied foundation models for robotic manipulation, but their deployment introduces a new unlearning challenge: removing unsafe, spurious, or privacy-sensitive behaviors without degrading perception, language grounding, and action control. In OpenVLA-style policies, behavior is produced through a fused visual encoder, a cross-modal projector, and a language backbone that predicts tokenized robot actions, so undesirable knowledge can be distributed across perception, alignment, and reasoning/action layers rather than confined to a single module. Consequently, partial unlearning applied only to the vision stack or only to the language backbone is often insufficient, while conventional unlearning baselines designed for standalone vision or language models may leave residual forgetting or incur unnecessary utility loss in embodied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
