LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation

Lingyi Hong; Zhongying Liu; Wenchao Chen; Chenzhi Tan; Yuang Feng,; Xinyu Zhou; Pinxue Guo; Jinglun Li; Zhaoyu Chen; Shuyong Gao; Wei Zhang,; Wenqiang Zhang

arXiv:2404.19326·cs.CV·May 2, 2024·1 cites

LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation

Lingyi Hong, Zhongying Liu, Wenchao Chen, Chenzhi Tan, Yuang Feng,, Xinyu Zhou, Pinxue Guo, Jinglun Li, Zhaoyu Chen, Shuyong Gao, Wei Zhang,, Wenqiang Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LVOS, a large-scale benchmark dataset with long-duration videos to evaluate and improve long-term video object segmentation models in realistic scenarios.

Contribution

The paper presents LVOS, a new benchmark dataset with over 700 videos averaging 1.14 minutes, designed to better reflect real-world long-term VOS challenges.

Findings

01

Existing VOS models show significant performance drops on LVOS.

02

Longer videos increase difficulty in accurate object tracking and segmentation.

03

LVOS highlights the need for models to handle reappearance and cross-temporal similarities.

Abstract

Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets restricts further investigation of VOS in realistic scenarios. Thus, we propose a novel benchmark named LVOS, comprising 720 videos with 296,401 frames and 407,945 high-quality annotations. Videos in LVOS last 1.14 minutes on average, approximately 5 times longer than videos in existing datasets. Each video includes various attributes, especially challenges deriving from the wild, such as long-term reappearing and cross-temporal similar objects. Compared to previous benchmarks, our LVOS better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LingyiHongfd/LVOS
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques

MethodsVOS · Focus