End-to-end video instance segmentation via spatial-temporal graph neural networks
Tao Wang, Ning Xu, Kean Chen, Weiyao Lin

TL;DR
This paper introduces a unified graph neural network framework for video instance segmentation that jointly optimizes detection, segmentation, and tracking by leveraging spatial-temporal information, significantly improving performance.
Contribution
It presents a novel GNN-based approach that propagates spatial-temporal information across all subproblems in a unified framework, enhancing video instance segmentation.
Findings
Achieves 35.2% AP on YoutubeVIS with ResNet-50
Operates at 22 FPS, demonstrating efficiency
Outperforms existing methods on the dataset
Abstract
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain. Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step, which limit their capability to fully leverage and share useful spatial-temporal information for all the subproblems. In this paper, we propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation. Specifically, graph nodes representing instance features are used for detection and segmentation while graph edges representing instance relations are used for tracking. Both inter and intra-frame information is effectively propagated and shared via graph updates and all the subproblems (i.e. detection, segmentation and tracking) are jointly optimized in an unified framework. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
