Dual Temporal Memory Network for Efficient Video Object Segmentation

Kaihua Zhang; Long Wang; Dong Liu; Bo Liu; Qingshan Liu; Zhu Li

arXiv:2003.06125·cs.CV·March 16, 2020·1 cites

Dual Temporal Memory Network for Efficient Video Object Segmentation

Kaihua Zhang, Long Wang, Dong Liu, Bo Liu, Qingshan Liu, Zhu Li

PDF

Open Access

TL;DR

This paper introduces a dual temporal memory network for semi-supervised video object segmentation, leveraging short-term and long-term memories to improve accuracy and robustness against occlusions and drift.

Contribution

The novel dual memory architecture combines graph-based local interactions with a S-GRU for long-term evolution, enhancing VOS performance.

Findings

01

Achieves competitive results on DAVIS 2016, DAVIS 2017, and Youtube-VOS datasets.

02

Balances speed and accuracy effectively.

03

Robust against occlusions and drift errors.

Abstract

Video Object Segmentation (VOS) is typically formulated in a semi-supervised setting. Given the ground-truth segmentation mask on the first frame, the task of VOS is to track and segment the single or multiple objects of interests in the rest frames of the video at the pixel level. One of the fundamental challenges in VOS is how to make the most use of the temporal information to boost the performance. We present an end-to-end network which stores short- and long-term video sequence information preceding the current frame as the temporal memories to address the temporal modeling in VOS. Our network consists of two temporal sub-networks including a short-term memory sub-network and a long-term memory sub-network. The short-term memory sub-network models the fine-grained spatial-temporal interactions between local regions across neighboring frames in video via a graph-based learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings