2D-3D Pose Tracking with Multi-View Constraints
Huai Yu, Kuangyi Chen, Wen Yang, Sebastian Scherer, Gui-Song Xia

TL;DR
This paper introduces a multi-view 2D-3D pose tracking framework that leverages adjacent frame relationships and multi-view constraints to improve camera localization accuracy in 3D LiDAR maps.
Contribution
It proposes a novel 2D-3D pose tracking method combining a hybrid flow network and multi-view geometrical constraints with a cross-modal consistency loss.
Findings
Outperforms existing frame-by-frame 2D-3D pose tracking methods
Achieves state-of-the-art results on KITTI and Argoverse datasets
Demonstrates improved stability and accuracy in camera localization
Abstract
Camera localization in 3D LiDAR maps has gained increasing attention due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing methods mostly focus on addressing the cross-modal gaps, estimating camera poses frame by frame without considering the relationship between adjacent frames, which makes the pose tracking unstable. To alleviate this, we propose to couple the 2D-3D correspondences between adjacent frames using the 2D-2D feature matching, establishing the multi-view geometrical constraints for simultaneously estimating multiple camera poses. Specifically, we propose a new 2D-3D pose tracking framework, which consists: a front-end hybrid flow estimation network for consecutive frames and a back-end pose optimization module. We further design a cross-modal consistency-based loss to incorporate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
