Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency
Farnoosh Arefi, Amir M. Mansourian, Shohreh Kasaei

TL;DR
This paper introduces Eigen-Cluster VIS, a weakly-supervised video instance segmentation method that leverages spatio-temporal consistency through novel loss functions and clustering metrics, achieving competitive results without mask annotations.
Contribution
The work presents a new weakly-supervised approach for VIS that uses a Temporal Eigenvalue Loss and a Quality Cluster Coefficient to improve segmentation stability and quality without requiring mask annotations.
Findings
Achieves competitive accuracy on YouTube-VIS and OVIS datasets.
Effectively reduces temporal discontinuities in segmentation.
Narrowed the performance gap between supervised and weakly-supervised methods.
Abstract
The performance of Video Instance Segmentation (VIS) methods has improved significantly with the advent of transformer networks. However, these networks often face challenges in training due to the high annotation cost. To address this, unsupervised and weakly-supervised methods have been developed to reduce the dependency on annotations. This work introduces a novel weakly-supervised method called Eigen-Cluster VIS that, without requiring any mask annotations, achieves competitive accuracy compared to other VIS approaches. This method is based on two key innovations: a Temporal Eigenvalue Loss (TEL) and a clip-level Quality Cluster Coefficient (QCC). The TEL ensures temporal coherence by leveraging the eigenvalues of the Laplacian matrix derived from graph adjacency matrices. By minimizing the mean absolute error between the eigenvalues of adjacent frames, this loss function promotes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Video Analysis and Summarization
