Unifying Flow, Stereo and Depth Estimation

Haofei Xu; Jing Zhang; Jianfei Cai; Hamid Rezatofighi; Fisher Yu,; Dacheng Tao; Andreas Geiger

arXiv:2211.05783·cs.CV·July 27, 2023·1 cites

Unifying Flow, Stereo and Depth Estimation

Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu,, Dacheng Tao, Andreas Geiger

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unified model for optical flow, stereo matching, and depth estimation using a Transformer-based dense correspondence approach, enabling cross-task transfer and outperforming specialized methods.

Contribution

The authors propose a single model that unifies three perception tasks through a dense correspondence formulation using cross-attention, improving feature quality and efficiency.

Findings

01

Outperforms RAFT on Sintel dataset.

02

Achieves state-of-the-art results on 10 datasets.

03

Simpler and more efficient model design.

Abstract

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

autonomousvision/unimatch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Position-Wise Feed-Forward Layer · Linear Layer · Label Smoothing · Softmax · Adam · Absolute Position Encodings · Layer Normalization