Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy
Long Lian, Zhirong Wu, Stella X. Yu

TL;DR
IMAS introduces a two-stage unsupervised video object segmentation method that synergizes motion and appearance cues, effectively handling misleading motion signals and improving segmentation accuracy without manual annotations.
Contribution
The paper proposes a novel two-stage training framework with motion-appearance synergy and a motion-semantic alignment method for hyperparameter tuning, advancing unsupervised video object segmentation.
Findings
Surpasses previous methods by 8.3% on DAVIS16 benchmark
Effectively handles deformable objects and reflections
Reduces reliance on manual hyperparameter tuning
Abstract
We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference. Previous methods in unsupervised video object segmentation (UVOS) have demonstrated the effectiveness of motion as either input or supervision for segmentation. However, motion signals may be uninformative or even misleading in cases such as deformable objects and objects with reflections, causing unsatisfactory segmentation. In contrast, IMAS achieves Improved UVOS with Motion-Appearance Synergy. Our method has two training stages: 1) a motion-supervised object discovery stage that deals with motion-appearance conflicts through a learnable residual pathway; 2) a refinement stage with both low- and high-level appearance supervision to correct model misconceptions learned from misleading motion cues. Additionally, we propose motion-semantic alignment as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
MethodsAverage Pooling · Residual Block · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Convolution · Global Average Pooling · Kaiming Initialization · Residual Connection · Bottleneck Residual Block
