VideoCutLER: Surprisingly Simple Unsupervised Video Instance   Segmentation

Xudong Wang; Ishan Misra; Ziyun Zeng; Rohit Girdhar and; Trevor Darrell

arXiv:2308.14710·cs.CV·August 29, 2023

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar and, Trevor Darrell

PDF

Open Access 1 Repo

TL;DR

VideoCutLER introduces a simple, unsupervised approach to video instance segmentation that relies on high-quality pseudo masks and video synthesis, achieving state-of-the-art results without motion-based signals.

Contribution

It demonstrates that high-quality pseudo masks and video synthesis alone can enable effective unsupervised multi-instance video segmentation, surpassing previous methods.

Findings

01

Achieved 50.7% APvideo on YouTubeVIS-2019, surpassing prior state-of-the-art.

02

Outperformed DINO by 15.9% APvideo when used as a pretrained model.

03

Proved that motion estimates are not necessary for effective unsupervised segmentation.

Abstract

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% APvideo^50 , surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/cutler
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Analysis and Summarization · Advanced Image Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer