Unifying Flow, Stereo and Depth Estimation
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu,, Dacheng Tao, Andreas Geiger

TL;DR
This paper introduces a unified model for optical flow, stereo matching, and depth estimation using a Transformer-based dense correspondence approach, enabling cross-task transfer and outperforming specialized methods.
Contribution
The authors propose a single model that unifies three perception tasks through a dense correspondence formulation using cross-attention, improving feature quality and efficiency.
Findings
Outperforms RAFT on Sintel dataset.
Achieves state-of-the-art results on 10 datasets.
Simpler and more efficient model design.
Abstract
We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Enhancement Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Position-Wise Feed-Forward Layer · Linear Layer · Label Smoothing · Softmax · Adam · Absolute Position Encodings · Layer Normalization
