Contextual Guided Segmentation Framework for Semi-supervised Video Instance Segmentation
Trung-Nghia Le, Tam V. Nguyen, Minh-Triet Tran

TL;DR
This paper introduces a comprehensive semi-supervised video instance segmentation framework that combines instance re-identification, contextual cues, and guided attention to improve segmentation accuracy across challenging scenarios.
Contribution
The proposed CGS framework integrates multiple passes and novel techniques for improved video instance segmentation, including re-identification flow, skeleton-guided segmentation, and ROI-based fine segmentation.
Findings
Achieved top performance in DAVIS Challenge with over 75% in global score.
Demonstrated effective handling of occlusion, deformation, and re-appearance.
Outperformed previous methods on benchmark datasets.
Abstract
In this paper, we propose Contextual Guided Segmentation (CGS) framework for video instance segmentation in three passes. In the first pass, i.e., preview segmentation, we propose Instance Re-Identification Flow to estimate main properties of each instance (i.e., human/non-human, rigid/deformable, known/unknown category) by propagating its preview mask to other frames. In the second pass, i.e., contextual segmentation, we introduce multiple contextual segmentation schemes. For human instance, we develop skeleton-guided segmentation in a frame along with object flow to correct and refine the result across frames. For non-human instance, if the instance has a wide variation in appearance and belongs to known categories (which can be inferred from the initial mask), we adopt instance segmentation. If the non-human instance is nearly rigid, we train FCNs on synthesized images from the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
