WildPose: A Unified Framework for Robust Pose Estimation in the Wild

Jianhao Zheng; Liyuan Zhu; Zihan Zhu; Iro Armeni

arXiv:2605.12774·cs.CV·May 14, 2026

WildPose: A Unified Framework for Robust Pose Estimation in the Wild

Jianhao Zheng, Liyuan Zhu, Zihan Zhu, Iro Armeni

PDF

TL;DR

WildPose is a unified monocular pose estimation framework that effectively handles dynamic environments while maintaining high accuracy on static scenes, combining perceptual models with differentiable bundle adjustment.

Contribution

It introduces a novel framework connecting feedforward perceptual models with end-to-end optimization for robust pose estimation in diverse environments.

Findings

01

WildPose outperforms prior methods on dynamic, static, and low-ego-motion datasets.

02

The framework maintains state-of-the-art performance across multiple benchmarks.

03

Extensive experiments validate the robustness and accuracy of WildPose.

Abstract

Estimating camera pose in dynamic environments is a critical challenge, as most visual SLAM and SfM methods assume static scenes. While recent dynamic-aware methods exist, they are often not unified: semantic-based approaches are brittle, per-sequence optimization methods fail on short sequences, and other learned models may degrade on static-only scenes. We present WildPose, a unified monocular pose estimation framework that is robust in dynamic environments while maintaining state-of-the-art performance on static and low-ego-motion datasets. Our key insight is to connect two powerful paradigms in modern 3D vision: the rich perceptual frontend of feedforward models and the end-to-end optimization of differentiable bundle adjustment (BA). We achieve this with a 3D-aware update operator built on a frozen, pre-trained MASt3R feature backbone, together with a high-capacity motion mask…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.