Rethinking Video Segmentation with Masked Video Consistency: Did the   Model Learn as Intended?

Chen Liang; Qiang Guo; Xiaochao Qu; Luoqi Liu; Ting Liu

arXiv:2408.10627·cs.CV·August 21, 2024

Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?

Chen Liang, Qiang Guo, Xiaochao Qu, Luoqi Liu, Ting Liu

PDF

Open Access

TL;DR

This paper introduces Masked Video Consistency, a novel training strategy that improves video segmentation by enhancing spatial-temporal features and contextual understanding, leading to state-of-the-art results.

Contribution

It proposes Masked Video Consistency and Object Masked Attention to improve video segmentation accuracy without increasing model complexity.

Findings

01

Achieves state-of-the-art performance on five datasets.

02

Enhances temporal modeling with Object Masked Attention.

03

Improves segmentation consistency across frames.

Abstract

Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames. Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets. This leads to inconsistent segmentation results across frames. To address these issues, we propose a training strategy Masked Video Consistency, which enhances spatial and temporal feature aggregation. MVC introduces a training strategy that randomly masks image patches, compelling the network to predict the entire semantic segmentation, thus improving contextual information integration. Additionally, we introduce Object Masked Attention (OMA) to optimize the cross-attention mechanism by reducing the impact of irrelevant queries, thereby enhancing temporal modeling capabilities. Our approach,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Video Analysis and Summarization · Human Motion and Animation

MethodsSoftmax · Attention Is All You Need