Improving Weakly-supervised Video Instance Segmentation by Leveraging   Spatio-temporal Consistency

Farnoosh Arefi; Amir M. Mansourian; Shohreh Kasaei

arXiv:2408.16661·cs.CV·November 26, 2024

Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency

Farnoosh Arefi, Amir M. Mansourian, Shohreh Kasaei

PDF

Open Access 1 Repo

TL;DR

This paper introduces Eigen-Cluster VIS, a weakly-supervised video instance segmentation method that leverages spatio-temporal consistency through novel loss functions and clustering metrics, achieving competitive results without mask annotations.

Contribution

The work presents a new weakly-supervised approach for VIS that uses a Temporal Eigenvalue Loss and a Quality Cluster Coefficient to improve segmentation stability and quality without requiring mask annotations.

Findings

01

Achieves competitive accuracy on YouTube-VIS and OVIS datasets.

02

Effectively reduces temporal discontinuities in segmentation.

03

Narrowed the performance gap between supervised and weakly-supervised methods.

Abstract

The performance of Video Instance Segmentation (VIS) methods has improved significantly with the advent of transformer networks. However, these networks often face challenges in training due to the high annotation cost. To address this, unsupervised and weakly-supervised methods have been developed to reduce the dependency on annotations. This work introduces a novel weakly-supervised method called Eigen-Cluster VIS that, without requiring any mask annotations, achieves competitive accuracy compared to other VIS approaches. This method is based on two key innovations: a Temporal Eigenvalue Loss (TEL) and a clip-level Quality Cluster Coefficient (QCC). The TEL ensures temporal coherence by leveraging the eigenvalues of the Laplacian matrix derived from graph adjacency matrices. By minimizing the mean absolute error between the eigenvalues of adjacent frames, this loss function promotes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

farnooshar/eigenclustervis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Video Analysis and Summarization