In Defense of Online Models for Video Instance Segmentation
Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

TL;DR
This paper introduces a contrastive learning-based online framework for video instance segmentation that outperforms existing methods by learning more discriminative embeddings, effectively handling long videos and occlusions.
Contribution
It proposes a simple yet effective online method using contrastive learning to improve instance association, closing the performance gap with offline models.
Findings
Achieves 49.5 AP on YouTube-VIS 2019, surpassing prior online and offline methods.
Attains 30.2 AP on OVIS, outperforming previous approaches.
Won first place in the CVPR2022 Video Object Segmentation Challenge.
Abstract
In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent advantage in handling long video sequences and ongoing videos while offline models fail due to the limit of computational resources. Therefore, it would be highly desirable if online models can achieve comparable or even better performance than offline models. By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association between frames caused by the similar appearance among different instances in the feature space. Observing this, we propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsContrastive Learning
