In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

Xiao Pan; Peike Li; Zongxin Yang; Huiling Zhou; Chang Zhou; Hongxia; Yang; Jingren Zhou; Yi Yang

arXiv:2203.15312·cs.CV·October 25, 2022

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia, Yang, Jingren Zhou, Yi Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces INO generative learning, a unified framework combining image-level and pixel-level optimization for unsupervised video object segmentation using Vision Transformers, achieving state-of-the-art results.

Contribution

It proposes a novel INO generative learning approach that unifies high-level and pixel-level optimization in a single framework for VOS.

Findings

01

Outperforms previous state-of-the-art methods on DAVIS-2017 and YouTube-VOS datasets.

02

Effectively captures high-level semantics and fine-grained details.

03

Enhances temporal consistency in video segmentation.

Abstract

In this paper, we focus on unsupervised learning for Video Object Segmentation (VOS) which learns visual correspondence (i.e., the similarity between pixel-level features) from unlabeled videos. Previous methods are mainly based on the contrastive learning paradigm, which optimize either in image level or pixel level. Image-level optimization (e.g., the spatially pooled feature of ResNet) learns robust high-level semantics but is sub-optimal since the pixel-level features are optimized implicitly. By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation. To complementarily perform these two levels of optimization in a unified framework, we propose the In-aNd-Out (INO) generative learning from a purely generative perspective with the help of naturally designed class tokens and patch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pansanity666/INO_VOS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Linear Layer · Contrastive Learning · Softmax · Dropout · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Label Smoothing · Multi-Head Attention