TransFlow: Transformer as Flow Learner

Yawen Lu; Qifan Wang; Siqi Ma; Tong Geng; Yingjie Victor Chen; Huaijin; Chen; and Dongfang Liu

arXiv:2304.11523·cs.CV·April 25, 2023·1 cites

TransFlow: Transformer as Flow Learner

Yawen Lu, Qifan Wang, Siqi Ma, Tong Geng, Yingjie Victor Chen, Huaijin, Chen, and Dongfang Liu

PDF

Open Access

TL;DR

TransFlow introduces a transformer-based architecture for optical flow estimation, outperforming CNN methods by capturing global dependencies, handling occlusions better, and simplifying training procedures, achieving state-of-the-art results on multiple benchmarks.

Contribution

The paper presents TransFlow, a novel pure transformer architecture for optical flow that improves accuracy, robustness, and training simplicity over traditional CNN-based methods.

Findings

01

Achieves state-of-the-art results on Sintel and KITTI-15 datasets.

02

Effectively captures global dependencies with self-attention mechanisms.

03

Improves handling of occlusion and motion blur in flow estimation.

Abstract

Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. In this work, we propose TransFlow, a pure transformer architecture for optical flow estimation. Compared to dominant CNN-based methods, TransFlow demonstrates three advantages. First, it provides more accurate correlation and trustworthy matching in flow estimation by utilizing spatial self-attention and cross-attention mechanisms between adjacent frames to effectively capture global dependencies; Second, it recovers more compromised information (e.g., occlusion and motion blur) in flow estimation through long-range temporal association in dynamic scenes; Third, it enables a concise self-learning paradigm and effectively eliminate the complex and laborious multi-stage pre-training procedures. We achieve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Robotics and Sensor-Based Localization

MethodsSelf-Learning