A Flexible Recursive Network for Video Stereo Matching Based on Residual Estimation
Youchen Zhao, Guorong Luo, Hua Zhong, Haixiong Li

TL;DR
This paper introduces RecSM, a recursive neural network for video stereo matching that leverages residual estimation and multi-scale modules to significantly speed up processing while maintaining high accuracy.
Contribution
The paper proposes RecSM, a novel residual estimation network with a flexible recursive structure and modules for efficient and accurate video stereo matching.
Findings
RecSM achieves 4x speed improvement over ACVNet.
RecSM maintains high accuracy with only 0.7% decrease.
The flexible SCS design adapts to different practical scenarios.
Abstract
Due to the high similarity of disparity between consecutive frames in video sequences, the area where disparity changes is defined as the residual map, which can be calculated. Based on this, we propose RecSM, a network based on residual estimation with a flexible recursive structure for video stereo matching. The RecSM network accelerates stereo matching using a Multi-scale Residual Estimation Module (MREM), which employs the temporal context as a reference and rapidly calculates the disparity for the current frame by computing only the residual values between the current and previous frames. To further reduce the error of estimated disparities, we use the Disparity Optimization Module (DOM) and Temporal Attention Module (TAM) to enforce constraints between each module, and together with MREM, form a flexible Stackable Computation Structure (SCS), which allows for the design of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image and Video Stabilization · Image and Signal Denoising Methods
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
