Iterative Geometry Encoding Volume for Stereo Matching
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang

TL;DR
This paper introduces IGEV-Stereo, a novel deep network architecture that encodes geometry and context for improved stereo matching, achieving state-of-the-art accuracy and efficiency on benchmark datasets.
Contribution
The paper proposes IGEV-Stereo, a new architecture that builds a combined geometry encoding volume and iteratively updates disparity maps, improving accuracy and speed.
Findings
Ranks 1st on KITTI 2015 and 2012 among published methods.
Fastest among the top 10 methods on KITTI benchmarks.
Demonstrates strong cross-dataset generalization and high inference efficiency.
Abstract
Recurrent All-Pairs Field Transforms (RAFT) has shown great potentials in matching tasks. However, all-pairs correlations lack non-local geometry knowledge and have difficulties tackling local ambiguities in ill-posed regions. In this paper, we propose Iterative Geometry Encoding Volume (IGEV-Stereo), a new deep network architecture for stereo matching. The proposed IGEV-Stereo builds a combined geometry encoding volume that encodes geometry and context information as well as local matching details, and iteratively indexes it to update the disparity map. To speed up the convergence, we exploit GEV to regress an accurate starting point for ConvGRUs iterations. Our IGEV-Stereo ranks on KITTI 2015 and 2012 (Reflective) among all published methods and is the fastest among the top 10 methods. In addition, IGEV-Stereo has strong cross-dataset generalization as well as high inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
