DeVOS: Flow-Guided Deformable Transformer for Video Object Segmentation

Volodymyr Fedynyak; Yaroslav Romanus; Bohdan Hlovatskyi; Bohdan Sydor,; Oles Dobosevych; Igor Babin; Roman Riazantsev

arXiv:2405.08715·cs.CV·May 15, 2024

DeVOS: Flow-Guided Deformable Transformer for Video Object Segmentation

Volodymyr Fedynyak, Yaroslav Romanus, Bohdan Hlovatskyi, Bohdan Sydor,, Oles Dobosevych, Igor Babin, Roman Riazantsev

PDF

Open Access 1 Video

TL;DR

DeVOS introduces a novel deformable transformer architecture for video object segmentation that leverages motion-guided attention and optical flow to enhance temporal consistency and robustness in long-term tracking.

Contribution

The paper proposes DeVOS, combining memory-based matching with motion-guided deformable attention, improving long-term video object segmentation performance and stability.

Findings

01

Achieves top performance on DAVIS 2017 and YouTube-VOS datasets.

02

Introduces ADVA for adaptive local similarity search.

03

Maintains consistent runtime and memory efficiency.

Abstract

The recent works on Video Object Segmentation achieved remarkable results by matching dense semantic and instance-level features between the current and previous frames for long-time propagation. Nevertheless, global feature matching ignores scene motion context, failing to satisfy temporal consistency. Even though some methods introduce local matching branch to achieve smooth propagation, they fail to model complex appearance changes due to the constraints of the local window. In this paper, we present DeVOS (Deformable VOS), an architecture for Video Object Segmentation that combines memory-based matching with motion-guided propagation resulting in stable long-term modeling and strong temporal consistency. For short-term local propagation, we propose a novel attention mechanism ADVA (Adaptive Deformable Video Attention), allowing the adaption of similarity search region to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DeVos: Flow-Guided Deformable Transformer for Video Object Segmentation· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Medical Image Segmentation Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings