ZeroVO: Visual Odometry with Minimal Assumptions
Lei Lai, Zekai Yin, Eshed Ohn-Bar

TL;DR
ZeroVO is a calibration-free, semantic-aware visual odometry algorithm that generalizes across diverse cameras and environments, achieving significant improvements without fine-tuning or calibration.
Contribution
It introduces a geometry-aware, language-infused, semi-supervised VO method that generalizes across unseen domains without calibration or fine-tuning.
Findings
Over 30% improvement on standard benchmarks
Effective in diverse real-world scenarios
Operates without camera calibration or fine-tuning
Abstract
We introduce ZeroVO, a novel visual odometry (VO) algorithm that achieves zero-shot generalization across diverse cameras and environments, overcoming limitations in existing methods that depend on predefined or static camera calibration setups. Our approach incorporates three main innovations. First, we design a calibration-free, geometry-aware network structure capable of handling noise in estimated depth and camera parameters. Second, we introduce a language-based prior that infuses semantic information to enhance robust feature extraction and generalization to previously unseen domains. Third, we develop a flexible, semi-supervised training paradigm that iteratively adapts to new scenes using unlabeled data, further boosting the models' ability to generalize across diverse real-world scenarios. We analyze complex autonomous driving contexts, demonstrating over 30% improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Soft Robotics and Applications
