LA-Pose: Latent Action Pretraining Meets Pose Estimation

Zhengqing Wang; Saurabh Nair; Prajwal Chidananda; Pujith Kachana; Samuel Li; Matthew Brown; Yasutaka Furukawa

arXiv:2604.27448·cs.CV·May 1, 2026

LA-Pose: Latent Action Pretraining Meets Pose Estimation

Zhengqing Wang, Saurabh Nair, Prajwal Chidananda, Pujith Kachana, Samuel Li, Matthew Brown, Yasutaka Furukawa

PDF

TL;DR

LA-Pose introduces a self-supervised pretraining approach using inverse-dynamics models to improve camera pose estimation, achieving high accuracy with less labeled data.

Contribution

It repurposes latent action features from inverse-dynamics pretraining as inputs for pose estimation, reducing reliance on extensive 3D annotations.

Findings

01

LA-Pose outperforms recent feed-forward methods by over 10% on Waymo and PandaSet.

02

Achieves competitive or superior accuracy with significantly less labeled data.

03

First to demonstrate inverse-dynamics self-supervised learning effectiveness for pose estimation.

Abstract

This paper revisits camera pose estimation through the lens of self-supervised pretraining, focusing on inverse-dynamics pretraining as a scalable alternative to the current trend of fully supervised training with 3D annotations. Concretely, we employ inverse- and forward-dynamics models to learn latent action representations, similar to Genie from large-scale driving videos. Our idea is simple yet effective. Existing methods use latent actions in their original capacity, that is, as action conditioning of world-models or as proxies of robot action parameters in policy networks. Our method, dubbed LA-Pose, repurposes the latent action features as inputs to a camera pose estimator, finetuned on a limited set of high-quality 3D annotations. This formulation enables accurate and generalizable pose prediction while maintaining feed-forward efficiency. Extensive experiments on driving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.