Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised   Framework with Spatio-Temporal Collaboration

Liqi Yan; Qifan Wang; Siqi Ma; Jingang Wang; Changbin Yu

arXiv:2212.07592·cs.CV·December 16, 2022

Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration

Liqi Yan, Qifan Wang, Siqi Ma, Jingang Wang, Changbin Yu

PDF

TL;DR

This paper introduces STC-Seg, a weakly supervised framework for video instance segmentation that leverages spatio-temporal collaboration, pseudo-labels, and a novel puzzle loss to achieve competitive results with less annotation effort.

Contribution

The paper proposes a novel weakly supervised video instance segmentation framework combining depth, optical flow, and a puzzle loss for end-to-end training, outperforming some fully supervised methods.

Findings

01

Outperforms fully supervised TrackR-CNN and MaskTrack R-CNN on KITTI MOTS and YT-VIS datasets.

02

Effectively utilizes pseudo-labels from depth and optical flow for training.

03

Enhances robustness with a spatio-temporal tracking module.

Abstract

Instance segmentation in videos, which aims to segment and track multiple objects in video frames, has garnered a flurry of research attention in recent years. In this paper, we present a novel weakly supervised framework with \textbf{S}patio-\textbf{T}emporal \textbf{C}ollaboration for instance \textbf{Seg}mentation in videos, namely \textbf{STC-Seg}. Concretely, STC-Seg demonstrates four contributions. First, we leverage the complementary representations from unsupervised depth estimation and optical flow to produce effective pseudo-labels for training deep networks and predicting high-quality instance masks. Second, to enhance the mask generation, we devise a puzzle loss, which enables end-to-end training using box-level annotations. Third, our tracking module jointly utilizes bounding-box diagonal points with spatio-temporal discrepancy to model movements, which largely improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.