Video Instance Segmentation by Instance Flow Assembly
Xiang Li, Jinglu Wang, Xiao Li, Yan Lu

TL;DR
This paper introduces a bottom-up video instance segmentation framework that leverages instance flow assembly and temporal context fusion to improve accuracy and robustness over existing methods, especially in capturing pixel-level temporal consistency.
Contribution
It proposes a novel bottom-up approach with a temporal context fusion module and instance flow for better inter-frame correspondence in video instance segmentation.
Findings
Outperforms state-of-the-art online methods on Youtube-VIS dataset.
Effectively captures pixel-level temporal consistency across frames.
Demonstrates robustness and efficiency in instance tracking.
Abstract
Instance segmentation is a challenging task aiming at classifying and segmenting all object instances of specific classes. While two-stage box-based methods achieve top performances in the image domain, they cannot easily extend their superiority into the video domain. This is because they usually deal with features or images cropped from the detected bounding boxes without alignment, failing to capture pixel-level temporal consistency. We embrace the observation that bottom-up methods dealing with box-free features could offer accurate spacial correlations across frames, which can be fully utilized for object and pixel level tracking. We first propose our bottom-up framework equipped with a temporal context fusion module to better encode inter-frame correlations. Intra-frame cues for semantic segmentation and object localization are simultaneously extracted and reconstructed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
