Improving Unsupervised Video Object Segmentation with Motion-Appearance   Synergy

Long Lian; Zhirong Wu; Stella X. Yu

arXiv:2212.08816·cs.CV·December 20, 2022

Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy

Long Lian, Zhirong Wu, Stella X. Yu

PDF

Open Access

TL;DR

IMAS introduces a two-stage unsupervised video object segmentation method that synergizes motion and appearance cues, effectively handling misleading motion signals and improving segmentation accuracy without manual annotations.

Contribution

The paper proposes a novel two-stage training framework with motion-appearance synergy and a motion-semantic alignment method for hyperparameter tuning, advancing unsupervised video object segmentation.

Findings

01

Surpasses previous methods by 8.3% on DAVIS16 benchmark

02

Effectively handles deformable objects and reflections

03

Reduces reliance on manual hyperparameter tuning

Abstract

We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference. Previous methods in unsupervised video object segmentation (UVOS) have demonstrated the effectiveness of motion as either input or supervision for segmentation. However, motion signals may be uninformative or even misleading in cases such as deformable objects and objects with reflections, causing unsatisfactory segmentation. In contrast, IMAS achieves Improved UVOS with Motion-Appearance Synergy. Our method has two training stages: 1) a motion-supervised object discovery stage that deals with motion-appearance conflicts through a learnable residual pathway; 2) a refinement stage with both low- and high-level appearance supervision to correct model misconceptions learned from misleading motion cues. Additionally, we propose motion-semantic alignment as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsAverage Pooling · Residual Block · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Convolution · Global Average Pooling · Kaiming Initialization · Residual Connection · Bottleneck Residual Block