MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

Henghui Ding; Chang Liu; Shuting He; Xudong Jiang; Philip H.S. Torr,; Song Bai

arXiv:2302.01872·cs.CV·October 24, 2023·6 cites

MOSE: A New Dataset for Video Object Segmentation in Complex Scenes

Henghui Ding, Chang Liu, Shuting He, Xudong Jiang, Philip H.S. Torr,, Song Bai

PDF

Open Access 1 Repo

TL;DR

The paper introduces MOSE, a challenging new video object segmentation dataset with complex scenes, and benchmarks existing methods revealing significant performance gaps in such environments.

Contribution

It presents MOSE, a large-scale dataset with complex scenes for VOS, and evaluates 18 methods, highlighting the need for improved algorithms in real-world scenarios.

Findings

01

Current VOS methods achieve only 59.4% J&F on MOSE, much lower than on existing datasets.

02

Existing algorithms struggle with occlusion and crowded scenes in MOSE.

03

There is a significant performance gap indicating challenges in complex environments.

Abstract

Video object segmentation (VOS) aims at segmenting a particular object throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% J&F) on existing datasets. However, since the target objects in these existing datasets are usually relatively salient, dominant, and isolated, VOS under complex scenes has rarely been studied. To revisit VOS and make it more applicable in the real world, we collect a new VOS dataset called coMplex video Object SEgmentation (MOSE) to study the tracking and segmenting objects in complex environments. MOSE contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks. The most notable feature of MOSE dataset is complex scenes with crowded and occluded objects. The target objects in the videos are commonly occluded by others and disappear in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

henghuiding/MOSE-api
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsVOS · Contrastive Language-Image Pre-training