ViBA: Implicit Bundle Adjustment with Geometric and Temporal Consistency for Robust Visual Matching
Xiaoji Niu, Yuqing Wang, Yan Wang, Hailiang Tang, and Tisheng Zhang

TL;DR
ViBA is a novel learning framework that combines geometric optimization with feature learning for robust, real-time visual matching and localization in unconstrained video streams.
Contribution
It introduces an implicit bundle adjustment approach integrated into a visual odometry pipeline, enabling continuous online learning with geometric and temporal consistency.
Findings
Reduces translation error by 12-18% compared to state-of-the-art methods.
Maintains over 90% localization accuracy on unseen sequences.
Operates at real-time speeds of 36-91 FPS.
Abstract
Most existing image keypoint detection and description methods rely on datasets with accurate pose and depth annotations, limiting scalability and generalization, and often degrading navigation and localization performance. We propose ViBA, a sustainable learning framework that integrates geometric optimization with feature learning for continuous online training on unconstrained video streams. Embedded in a standard visual odometry pipeline, it consists of an implicitly differentiable geometric residual framework: (i) an initial tracking network for inter-frame correspondences, (ii) depth-based outlier filtering, and (iii) differentiable global bundle adjustment that jointly refines camera poses and feature positions by minimizing reprojection errors. By combining geometric consistency from BA with long-term temporal consistency across frames, ViBA enforces stable and accurate feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
