MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction

Jung Min Lee; Dohyeok Lee; Seokhun Ju; Taehyun Cho; Jin Woo Koo; Li Zhao; Sangwoo Hong; Jungwoo Lee

arXiv:2602.03668·cs.RO·May 5, 2026

MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction

Jung Min Lee, Dohyeok Lee, Seokhun Ju, Taehyun Cho, Jin Woo Koo, Li Zhao, Sangwoo Hong, Jungwoo Lee

PDF

1 Repo

TL;DR

MVP-LAM learns multi-viewpoint latent actions that are highly informative about ground-truth actions, improving action prediction and downstream manipulation tasks through cross-view reconstruction.

Contribution

It introduces MVP-LAM, a novel multi-viewpoint learning framework that enhances latent action representations for better supervision and downstream performance.

Findings

01

Higher mutual information with ground-truth actions.

02

Improved action prediction accuracy.

03

Enhanced downstream manipulation performance.

Abstract

Latent actions learned from diverse human videos serve as pseudo-labels for vision-language-action (VLA) pretraining, but provide effective supervision only if they remain informative about the underlying ground-truth actions. For effective supervision, latent actions should contain information about the underlying actions even though they are inaccessible. We propose Multi-ViewPoint Latent Action Moel (MVP-LAM), which learns latent actions that are highly informative about ground-truth actions from multi-view videos. MVP-LAM trains latent actions with a cross-viewpoint reconstruction objective, so that a latent action from one view must explain the future in another view, reducing reliance on viewpoint-specific cues. On Bridge V2, MVP-LAM produces more action-centric latent actions, achieving higher mutual information with ground-truth actions and improved action prediction, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://jmsnu.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.