Siamese Network with Interactive Transformer for Video Object   Segmentation

Meng Lan; Jing Zhang; Fengxiang He; Lefei Zhang

arXiv:2112.13983·cs.CV·December 30, 2021

Siamese Network with Interactive Transformer for Video Object Segmentation

Meng Lan, Jing Zhang, Fengxiang He, Lefei Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SITVOS, a Siamese network with an interactive transformer that effectively propagates spatio-temporal context for semi-supervised video object segmentation, achieving superior results on benchmark datasets.

Contribution

The paper presents a novel Siamese network with an interactive transformer and feature interaction module for improved context propagation in VOS.

Findings

01

Outperforms state-of-the-art methods on three benchmarks.

02

Efficient feature reuse via Siamese architecture.

03

Effective spatio-temporal context encoding with transformer.

Abstract

Semi-supervised video object segmentation (VOS) refers to segmenting the target object in remaining frames given its annotation in the first frame, which has been actively studied in recent years. The key challenge lies in finding effective ways to exploit the spatio-temporal context of past frames to help learn discriminative target representation of current frame. In this paper, we propose a novel Siamese network with a specifically designed interactive transformer, called SITVOS, to enable effective context propagation from historical to current frames. Technically, we use the transformer encoder and decoder to handle the past frames and current frame separately, i.e., the encoder encodes robust spatio-temporal context of target object from the past frames, while the decoder takes the feature embedding of current frame as the query to retrieve the target from the encoder output. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lanmng/sitvos
pytorchOfficial

Videos

Siamese Network with Interactive Transformer for Video Object Segmentation· underline

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsSiamese Network