DINO-VO: Learning Where to Focus for Enhanced State Estimation
Qi Chen, Guanghao Li, Sijia Hu, Xin Gao, Junpeng Ma, Xiangyang Xue, Jian Pu

TL;DR
DINO-VO is an end-to-end monocular visual odometry system that uses adaptive patch selection and multi-task learning to improve accuracy and generalization across diverse environments.
Contribution
It introduces a differentiable adaptive patch selector and a multi-task feature extraction module with bundle adjustment, enhancing robustness and generalization in VO.
Findings
Achieves state-of-the-art tracking accuracy on multiple datasets.
Demonstrates strong generalization across synthetic, indoor, and outdoor environments.
Outperforms existing VO systems in accuracy and robustness.
Abstract
We present DINO Patch Visual Odometry (DINO-VO), an end-to-end monocular visual odometry system with strong scene generalization. Current Visual Odometry (VO) systems often rely on heuristic feature extraction strategies, which can degrade accuracy and robustness, particularly in large-scale outdoor environments. DINO-VO addresses these limitations by incorporating a differentiable adaptive patch selector into the end-to-end pipeline, improving the quality of extracted patches and enhancing generalization across diverse datasets. Additionally, our system integrates a multi-task feature extraction module with a differentiable bundle adjustment (BA) module that leverages inverse depth priors, enabling the system to learn and utilize appearance and geometric information effectively. This integration bridges the gap between feature learning and state estimation. Extensive experiments on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
