WildPose: A Unified Framework for Robust Pose Estimation in the Wild
Jianhao Zheng, Liyuan Zhu, Zihan Zhu, Iro Armeni

TL;DR
WildPose is a unified monocular pose estimation framework that effectively handles dynamic environments while maintaining high accuracy on static scenes, combining perceptual models with differentiable bundle adjustment.
Contribution
It introduces a novel framework connecting feedforward perceptual models with end-to-end optimization for robust pose estimation in diverse environments.
Findings
WildPose outperforms prior methods on dynamic, static, and low-ego-motion datasets.
The framework maintains state-of-the-art performance across multiple benchmarks.
Extensive experiments validate the robustness and accuracy of WildPose.
Abstract
Estimating camera pose in dynamic environments is a critical challenge, as most visual SLAM and SfM methods assume static scenes. While recent dynamic-aware methods exist, they are often not unified: semantic-based approaches are brittle, per-sequence optimization methods fail on short sequences, and other learned models may degrade on static-only scenes. We present WildPose, a unified monocular pose estimation framework that is robust in dynamic environments while maintaining state-of-the-art performance on static and low-ego-motion datasets. Our key insight is to connect two powerful paradigms in modern 3D vision: the rich perceptual frontend of feedforward models and the end-to-end optimization of differentiable bundle adjustment (BA). We achieve this with a 3D-aware update operator built on a frozen, pre-trained MASt3R feature backbone, together with a high-capacity motion mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
