Keyframe-Based Feed-Forward Visual Odometry

Weichen Dai; Wenhan Su; Da Kong; Yuhang Ming; Wanzeng Kong

arXiv:2601.16020·cs.CV·January 23, 2026

Keyframe-Based Feed-Forward Visual Odometry

Weichen Dai, Wenhan Su, Da Kong, Yuhang Ming, Wanzeng Kong

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based keyframe selection method for feed-forward visual odometry, improving efficiency and accuracy by reducing redundancy and leveraging adaptive keyframe policies.

Contribution

It presents a novel keyframe-based approach that integrates reinforcement learning to adaptively select keyframes, enhancing existing foundation model-based visual odometry methods.

Findings

01

Significant performance improvements over state-of-the-art methods

02

Efficient reduction of computational redundancy

03

Robustness across multiple real-world datasets

Abstract

The emergence of visual foundation models has revolutionized visual odometry~(VO) and SLAM, enabling pose estimation and dense reconstruction within a single feed-forward network. However, unlike traditional pipelines that leverage keyframe methods to enhance efficiency and accuracy, current foundation model based methods, such as VGGT-Long, typically process raw image sequences indiscriminately. This leads to computational redundancy and degraded performance caused by low inter-frame parallax, which provides limited contextual stereo information. Integrating traditional geometric heuristics into these methods is non-trivial, as their performance depends on high-dimensional latent representations rather than explicit geometric metrics. To bridge this gap, we propose a novel keyframe-based feed-forward VO. Instead of relying on hand-crafted rules, our approach employs reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques