ChiTransformer:Towards Reliable Stereo from Cues

Qing Su; Shihao Ji

arXiv:2203.04554·cs.CV·November 2, 2023

ChiTransformer:Towards Reliable Stereo from Cues

Qing Su, Shihao Ji

PDF

Open Access 1 Repo

TL;DR

ChiTransformer introduces a biologically inspired self-supervised binocular depth estimation method using vision transformers with gated cross-attention, significantly improving stereo matching accuracy in various environments.

Contribution

The paper proposes a novel ChiTransformer architecture that leverages gated positional cross-attention in vision transformers for reliable stereo depth estimation, inspired by the human visual system.

Findings

01

Achieves 11% improvement over state-of-the-art methods.

02

Effective on both rectilinear and fisheye images.

03

Demonstrates robustness in dynamic and cluttered environments.

Abstract

Current stereo matching techniques are challenged by restricted searching space, occluded regions and sheer size. While single image depth estimation is spared from these challenges and can achieve satisfactory results with the extracted monocular cues, the lack of stereoscopic relationship renders the monocular prediction less reliable on its own, especially in highly dynamic or cluttered environments. To address these issues in both scenarios, we present an optic-chiasm-inspired self-supervised binocular depth estimation method, wherein a vision transformer (ViT) with gated positional cross-attention (GPCA) layers is designed to enable feature-sensitive pattern retrieval between views while retaining the extensive context information aggregated through self-attentions. Monocular cues from a single view are thereafter conditionally rectified by a blending layer with the retrieved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isl-cv/chitransformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Multi-Head Attention · Residual Connection · Layer Normalization · Vision Transformer