Context-Enhanced Stereo Transformer

Weiyu Guo; Zhaoshuo Li; Yongkui Yang; Zheng Wang; Russell H. Taylor,; Mathias Unberath; Alan Yuille; and Yingwei Li

arXiv:2210.11719·cs.CV·October 24, 2022

Context-Enhanced Stereo Transformer

Weiyu Guo, Zhaoshuo Li, Yongkui Yang, Zheng Wang, Russell H. Taylor,, Mathias Unberath, Alan Yuille, and Yingwei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces Context Enhanced Stereo Transformer (CSTR), a novel model that incorporates a Context Enhanced Path to improve stereo depth estimation, especially in challenging regions, by capturing long-range global information, leading to superior performance across multiple datasets.

Contribution

The paper proposes the CEP module integrated into a stereo transformer to enhance generalization and robustness in stereo depth estimation, addressing limitations of existing methods.

Findings

01

CSTR outperforms prior approaches on multiple datasets.

02

CEP effectively captures long-range global information.

03

CSTR achieves an 11% improvement in zero-shot synthetic-to-real transfer on Middlebury-2014.

Abstract

Stereo depth estimation is of great interest for computer vision research. However, existing methods struggles to generalize and predict reliably in hazardous regions, such as large uniform regions. To overcome these limitations, we propose Context Enhanced Path (CEP). CEP improves the generalization and robustness against common failure cases in existing solutions by capturing the long-range global information. We construct our stereo depth estimation model, Context Enhanced Stereo Transformer (CSTR), by plugging CEP into the state-of-the-art stereo depth estimation method Stereo Transformer. CSTR is examined on distinct public datasets, such as Scene Flow, Middlebury-2014, KITTI-2015, and MPI-Sintel. We find CSTR outperforms prior approaches by a large margin. For example, in the zero-shot synthetic-to-real setting, CSTR outperforms the best competing approaches on Middlebury-2014…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guoweiyu/context-enhanced-stereo-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Residual Connection