Rethinking Iterative Stereo Matching from Diffusion Bridge Model Perspective
Yuguang Shi

TL;DR
This paper introduces a novel diffusion model-based iterative stereo matching approach that enhances disparity map detail and accuracy, achieving state-of-the-art results with fewer iterations.
Contribution
It proposes integrating diffusion models into stereo matching with a new T-GRU and attention mechanisms, improving detail preservation and performance.
Findings
Achieved first place on Scene Flow dataset
Over 7% performance improvement over competitors
Requires only 8 iterations for top results
Abstract
Recently, iteration-based stereo matching has shown great potential. However, these models optimize the disparity map using RNN variants. The discrete optimization process poses a challenge of information loss, which restricts the level of detail that can be expressed in the generated disparity map. In order to address these issues, we propose a novel training approach that incorporates diffusion models into the iterative optimization process. We designed a Time-based Gated Recurrent Unit (T-GRU) to correlate temporal and disparity outputs. Unlike standard recurrent units, we employ Agent Attention to generate more expressive features. We also designed an attention-based context network to capture a large amount of contextual information. Experiments on several public benchmarks show that we have achieved competitive stereo matching performance. Our model ranks first in the Scene Flow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Cancer-related molecular mechanisms research
MethodsDiffusion
