AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models

Hyeongjun Heo; Seungyeon Woo; Sang Min Kim; Junho Kim; Junho Lee; Yonghyeon Lee; Young Min Kim

arXiv:2603.05868·cs.RO·March 9, 2026

AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models

Hyeongjun Heo, Seungyeon Woo, Sang Min Kim, Junho Kim, Junho Lee, Yonghyeon Lee, Young Min Kim

PDF

Open Access

TL;DR

This paper introduces a zero-shot camera adaptation method for Vision-Language-Action models that improves viewpoint robustness in robotic manipulation without additional training or data, using real-time view synthesis.

Contribution

It proposes a novel zero-shot framework that virtually adjusts camera views at test time, eliminating the need for fine-tuning or extra data, and enhances model robustness in unstructured environments.

Findings

01

Outperforms data augmentation baselines on LIBERO benchmark

02

Improves viewpoint robustness in real-world robotic scenarios

03

Effective with various camera configurations and movements

Abstract

Despite remarkable progress in Vision-Language-Action models (VLAs) for robot manipulation, these large pre-trained models require fine-tuning to be deployed in specific environments. These fine-tuned models are highly sensitive to camera viewpoint changes that frequently occur in unstructured environments. In this paper, we propose a zero-shot camera adaptation framework without additional demonstration data, policy fine-tuning, or architectural modification. Our key idea is to virtually adjust test-time camera observations to match the training camera configuration in real-time. For that, we use a recent feed-forward novel view synthesis model which outputs high-quality target view images, handling both extrinsic and intrinsic parameters. This plug-and-play approach preserves the pre-trained capabilities of VLAs and applies to any RGB-based policy. Through extensive experiments on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning