FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching

Alp Eren Sari; Paolo Favaro

arXiv:2505.13174·cs.CV·May 20, 2025

FlowCut: Unsupervised Video Instance Segmentation via Temporal Mask Matching

Alp Eren Sari, Paolo Favaro

PDF

Open Access

TL;DR

FlowCut introduces an unsupervised video instance segmentation method that constructs a pseudo-labeled dataset through a three-stage process, enabling training without manual annotations and achieving state-of-the-art results.

Contribution

It is the first to create a pseudo-labeled video dataset for unsupervised video instance segmentation and demonstrates its effectiveness with superior benchmark performance.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Successfully constructs high-quality pseudo-labels for training.

03

Demonstrates the effectiveness of temporal mask matching.

Abstract

We propose FlowCut, a simple and capable method for unsupervised video instance segmentation consisting of a three-stage framework to construct a high-quality video dataset with pseudo labels. To our knowledge, our work is the first attempt to curate a video dataset with pseudo-labels for unsupervised video instance segmentation. In the first stage, we generate pseudo-instance masks by exploiting the affinities of features from both images and optical flows. In the second stage, we construct short video segments containing high-quality, consistent pseudo-instance masks by temporally matching them across the frames. In the third stage, we use the YouTubeVIS-2021 video dataset to extract our training instance segmentation set, and then train a video segmentation model. FlowCut achieves state-of-the-art performance on the YouTubeVIS-2019, YouTubeVIS-2021, DAVIS-2017, and DAVIS-2017 Motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis