2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth   Estimation

Rohit Choudhary; Mansi Sharma; Rithvik Anil

arXiv:2210.15374·cs.CV·October 28, 2022

2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth Estimation

Rohit Choudhary, Mansi Sharma, Rithvik Anil

PDF

Open Access

TL;DR

The paper introduces 2T-UNet, a novel two-tower CNN architecture that leverages depth clues and different inputs to improve stereo depth estimation without explicit stereo matching, outperforming existing methods.

Contribution

It proposes a new two-tower network architecture that replaces cost volume construction with twin convolutional towers and incorporates monocular depth clues for enhanced stereo depth estimation.

Findings

01

Outperforms state-of-the-art methods on Scene flow dataset

02

Effective on complex natural scenes

03

Suitable for real-time applications

Abstract

Stereo correspondence matching is an essential part of the multi-step stereo depth estimation process. This paper revisits the depth estimation problem, avoiding the explicit stereo matching step using a simple two-tower convolutional neural network. The proposed algorithm is entitled as 2T-UNet. The idea behind 2T-UNet is to replace cost volume construction with twin convolution towers. These towers have an allowance for different weights between them. Additionally, the input for twin encoders in 2T-UNet are different compared to the existing stereo methods. Generally, a stereo network takes a right and left image pair as input to determine the scene geometry. However, in the 2T-UNet model, the right stereo image is taken as one input and the left stereo image along with its monocular depth clue information, is taken as the other input. Depth clues provide complementary suggestions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Advanced Image Processing Techniques

MethodsConvolution