Keyframe-Based Feed-Forward Visual Odometry
Weichen Dai, Wenhan Su, Da Kong, Yuhang Ming, Wanzeng Kong

TL;DR
This paper introduces a reinforcement learning-based keyframe selection method for feed-forward visual odometry, improving efficiency and accuracy by reducing redundancy and leveraging adaptive keyframe policies.
Contribution
It presents a novel keyframe-based approach that integrates reinforcement learning to adaptively select keyframes, enhancing existing foundation model-based visual odometry methods.
Findings
Significant performance improvements over state-of-the-art methods
Efficient reduction of computational redundancy
Robustness across multiple real-world datasets
Abstract
The emergence of visual foundation models has revolutionized visual odometry~(VO) and SLAM, enabling pose estimation and dense reconstruction within a single feed-forward network. However, unlike traditional pipelines that leverage keyframe methods to enhance efficiency and accuracy, current foundation model based methods, such as VGGT-Long, typically process raw image sequences indiscriminately. This leads to computational redundancy and degraded performance caused by low inter-frame parallax, which provides limited contextual stereo information. Integrating traditional geometric heuristics into these methods is non-trivial, as their performance depends on high-dimensional latent representations rather than explicit geometric metrics. To bridge this gap, we propose a novel keyframe-based feed-forward VO. Instead of relying on hand-crafted rules, our approach employs reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
