Rethinking RAFT for Efficient Optical Flow

Navid Eslami; Farnoosh Arefi; Amir M. Mansourian; Shohreh Kasaei

arXiv:2401.00833·cs.CV·January 2, 2024·2 cites

Rethinking RAFT for Efficient Optical Flow

Navid Eslami, Farnoosh Arefi, Amir M. Mansourian, Shohreh Kasaei

PDF

Open Access 1 Repo

TL;DR

This paper introduces Ef-RAFT, an improved optical flow method based on RAFT, incorporating attention mechanisms and novel operators to better handle large displacements and repetitive patterns, with notable accuracy and efficiency gains.

Contribution

It proposes a new Attention-based Feature Localization and Amorphous Lookup Operator to enhance RAFT's performance and efficiency in optical flow estimation.

Findings

01

Achieves 10% improvement on Sintel dataset

02

Achieves 5% improvement on KITTI dataset

03

Reduces runtime by 33% with minimal memory increase

Abstract

Despite significant progress in deep learning-based optical flow methods, accurately estimating large displacements and repetitive patterns remains a challenge. The limitations of local features and similarity search patterns used in these algorithms contribute to this issue. Additionally, some existing methods suffer from slow runtime and excessive graphic memory consumption. To address these problems, this paper proposes a novel approach based on the RAFT framework. The proposed Attention-based Feature Localization (AFL) approach incorporates the attention mechanism to handle global feature extraction and address repetitive patterns. It introduces an operator for matching pixels with corresponding counterparts in the second frame and assigning accurate flow values. Furthermore, an Amorphous Lookup Operator (ALO) is proposed to enhance convergence speed and improve RAFTs ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

n3slami/Ef-RAFT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Retinal Imaging and Analysis · Image Processing Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings