ESMStereo: Enhanced ShuffleMixer Disparity Upsampling for Real-Time and Accurate Stereo Matching
Mahmoud Tahmasebi, Saif Huq, Kevin Meehan, Marion McAfee

TL;DR
ESMStereo introduces an enhanced ShuffleMixer module that improves disparity upsampling in stereo matching, enabling real-time, high-accuracy depth estimation by effectively restoring details with low computational cost.
Contribution
The paper proposes the Enhanced Shuffle Mixer (ESM) to mitigate information loss in small-scale cost volumes, enabling real-time stereo matching with high accuracy.
Findings
Achieves 116 FPS inference on high-end GPUs.
Provides disparity maps with high accuracy in real-time.
Uses a compact hourglass network for detail refinement.
Abstract
Stereo matching has become an increasingly important component of modern autonomous systems. Developing deep learning-based stereo matching models that deliver high accuracy while operating in real-time continues to be a major challenge in computer vision. In the domain of cost-volume-based stereo matching, accurate disparity estimation depends heavily on large-scale cost volumes. However, such large volumes store substantial redundant information and also require computationally intensive aggregation units for processing and regression, making real-time performance unattainable. Conversely, small-scale cost volumes followed by lightweight aggregation units provide a promising route for real-time performance, but lack sufficient information to ensure highly accurate disparity estimation. To address this challenge, we propose the Enhanced Shuffle Mixer (ESM) to mitigate information loss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image and Signal Denoising Methods
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
