Multi-grained Temporal Prototype Learning for Few-shot Video Object   Segmentation

Nian Liu; Kepan Nan; Wangbo Zhao; Yuanwei Liu; Xiwen Yao; Salman Khan,; Hisham Cholakkal; Rao Muhammad Anwer; Junwei Han; Fahad Shahbaz Khan

arXiv:2309.11160·cs.CV·September 21, 2023

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan,, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-grained temporal prototype learning approach for few-shot video object segmentation, leveraging local and long-term temporal cues to improve segmentation accuracy in videos with limited annotations.

Contribution

It proposes a novel multi-grained temporal guidance framework that decomposes video information into clip and memory prototypes, enhancing few-shot video segmentation performance.

Findings

01

Significantly outperforms previous models on benchmark datasets.

02

Effectively captures local and long-term temporal correlations.

03

Reduces noise influence through structural similarity-based memory selection.

Abstract

Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nankepan/VIPMT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection

MethodsContrastive Language-Image Pre-training