PM-VIS+: High-Performance Video Instance Segmentation without Video Annotation
Zhangjing Yang, Dun Liu, Xin Wang, Zhe Li, Barathwaj Anandan, Yi Wu

TL;DR
This paper presents PM-VIS+, a high-performance video instance segmentation method that leverages image datasets and semi-supervised learning to eliminate the need for costly video annotations.
Contribution
It introduces a novel approach that adapts image-based annotations for video segmentation and employs pseudo masks and semi-supervised techniques to improve accuracy without manual video annotations.
Findings
Achieves competitive video segmentation performance without video annotations.
Utilizes ImageNet-bbox to supplement missing categories in datasets.
Employs pseudo masks and semi-supervised optimization for enhanced accuracy.
Abstract
Video instance segmentation requires detecting, segmenting, and tracking objects in videos, typically relying on costly video annotations. This paper introduces a method that eliminates video annotations by utilizing image datasets. The PM-VIS algorithm is adapted to handle both bounding box and instance-level pixel annotations dynamically. We introduce ImageNet-bbox to supplement missing categories in video datasets and propose the PM-VIS+ algorithm to adjust supervision based on annotation types. To enhance accuracy, we use pseudo masks and semi-supervised optimization techniques on unannotated video data. This method achieves high video instance segmentation performance without manual video annotations, offering a cost-effective solution and new perspectives for video instance segmentation applications. The code will be available in https://github.com/ldknight/PM-VIS-plus
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
