Two-Level Temporal Relation Model for Online Video Instance Segmentation
\c{C}a\u{g}an Selim \c{C}oban, O\u{g}uzhan Keskin, Jordi Pont-Tuset,, Fatma G\"uney

TL;DR
This paper introduces an online video instance segmentation model that achieves state-of-the-art performance by encoding temporal relations with a message-passing graph neural network and fusing multi-scale features effectively.
Contribution
The paper presents a novel online method with a message-passing graph neural network and a feature fusion module, matching offline methods' performance in video instance segmentation.
Findings
Achieves state-of-the-art online VIS performance on YouTube-VIS dataset.
Demonstrates strong generalization to video object segmentation on DAVIS.
Proposes an end-to-end trainable model with effective temporal encoding.
Abstract
In Video Instance Segmentation (VIS), current approaches either focus on the quality of the results, by taking the whole video as input and processing it offline; or on speed, by handling it frame by frame at the cost of competitive performance. In this work, we propose an online method that is on par with the performance of the offline counterparts. We introduce a message-passing graph neural network that encodes objects and relates them through time. We additionally propose a novel module to fuse features from the feature pyramid network with residual connections. Our model, trained end-to-end, achieves state-of-the-art performance on the YouTube-VIS dataset within the online methods. Further experiments on DAVIS demonstrate the generalization capability of our model to the video object segmentation task. Code is available at: \url{https://github.com/caganselim/TLTM}
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsGraph Neural Network
