Learning to Act Robustly with View-Invariant Latent Actions

Youngjoon Jeong; Junha Chun; Taesup Kim

arXiv:2601.02994·cs.RO·January 7, 2026

Learning to Act Robustly with View-Invariant Latent Actions

Youngjoon Jeong, Junha Chun, Taesup Kim

PDF

Open Access

TL;DR

This paper introduces VILA, a method for learning view-invariant, physically grounded latent actions to improve the robustness of vision-based robotic policies against viewpoint changes, enhancing generalization and transferability.

Contribution

VILA models latent actions based on physical dynamics and aligns them across viewpoints, providing a novel approach for view-invariant policy learning in robotics.

Findings

01

VILA enables policies to generalize to unseen viewpoints.

02

VILA improves transfer to new tasks in real-world experiments.

03

VILA enhances robustness of vision-based robotic policies.

Abstract

Vision-based robotic policies often struggle with even minor viewpoint changes, underscoring the need for view-invariant visual representations. This challenge becomes more pronounced in real-world settings, where viewpoint variability is unavoidable and can significantly disrupt policy performance. Existing methods typically learn invariance from multi-view observations at the scene level, but such approaches rely on visual appearance and fail to incorporate the physical dynamics essential for robust generalization. We propose View-Invariant Latent Action (VILA), which models a latent action capturing transition patterns across trajectories to learn view-invariant representations grounded in physical dynamics. VILA aligns these latent actions across viewpoints using an action-guided objective based on ground-truth action sequences. Experiments in both simulation and the real world show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications