Generalized Geometry Encoding Volume for Real-time Stereo Matching
Jiaxin Liu, Gangwei Xu, Xianqi Wang, Chengliang Zhang, Xin Yang

TL;DR
This paper introduces GGEV, a real-time stereo matching network that encodes domain-invariant geometric priors to improve generalization across unseen scenes, achieving state-of-the-art results.
Contribution
The paper proposes a novel GGEV framework with depth-aware features and a dynamic cost aggregation module for enhanced generalization in stereo matching.
Findings
Outperforms existing real-time methods in zero-shot generalization
Achieves state-of-the-art results on KITTI and ETH3D benchmarks
Efficiently encodes geometric priors for robust disparity estimation
Abstract
Real-time stereo matching methods primarily focus on enhancing in-domain performance but often overlook the critical importance of generalization in real-world applications. In contrast, recent stereo foundation models leverage monocular foundation models (MFMs) to improve generalization, but typically suffer from substantial inference latency. To address this trade-off, we propose Generalized Geometry Encoding Volume (GGEV), a novel real-time stereo matching network that achieves strong generalization. We first extract depth-aware features that encode domain-invariant structural priors as guidance for cost aggregation. Subsequently, we introduce a Depth-aware Dynamic Cost Aggregation (DDCA) module that adaptively incorporates these priors into each disparity hypothesis, effectively enhancing fragile matching relationships in unseen scenes. Both steps are lightweight and complementary,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
