SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Brendan Duke; Abdalla Ahmed; Christian Wolf; Parham Aarabi; and Graham W. Taylor

arXiv:2101.08833·cs.CV·March 30, 2021

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Brendan Duke, Abdalla Ahmed, Christian Wolf, Parham Aarabi, and Graham W. Taylor

PDF

1 Repo

TL;DR

This paper introduces SSTVOS, a scalable Transformer-based approach for video object segmentation that leverages sparse spatiotemporal attention to improve accuracy and robustness over previous methods.

Contribution

The paper presents a novel end-to-end Transformer model with sparse attention for VOS, addressing scalability and error propagation issues of prior recurrent methods.

Findings

01

Achieves competitive results on YouTube-VOS and DAVIS 2017 datasets.

02

Demonstrates improved scalability and robustness to occlusions.

03

Outperforms state-of-the-art recurrent-based methods.

Abstract

In this paper we introduce a Transformer-based approach to video object segmentation (VOS). To address compounding error and scalability issues of prior work, we propose a scalable, end-to-end method for VOS called Sparse Spatiotemporal Transformers (SST). SST extracts per-pixel representations for each object in a video using sparse attention over spatiotemporal features. Our attention-based formulation for VOS allows a model to learn to attend over a history of multiple frames and provides suitable inductive bias for performing correspondence-like computations necessary for solving motion segmentation. We demonstrate the effectiveness of attention-based over recurrent networks in the spatiotemporal domain. Our method achieves competitive results on YouTube-VOS and DAVIS 2017 with improved scalability and robustness to occlusions compared with the state of the art. Code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dukebw/SSTVOS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsVOS