PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Henghui Ding; Chang Liu; Yunchao Wei; Nikhila Ravi; Shuting He; Song; Bai; Philip Torr; Deshui Miao; Xin Li; Zhenyu He; Yaowei Wang; Ming-Hsuan; Yang; Zhensong Xu; Jiangtao Yao; Chengjing Wu; Ting Liu; Luoqi Liu; Xinyu; Liu; Jing Zhang; Kexin Zhang; Yuting Yang; Licheng Jiao; Shuyuan Yang; Mingqi; Gao; Jingnan Luo; Jinyu Yang; Jungong Han; Feng Zheng; Bin Cao; Yisi Zhang,; Xuanxu Lin; Xingjian He; Bo Zhao; Jing Liu; Feiyu Pan; Hao Fang; Xiankai Lu

arXiv:2406.17005·cs.CV·June 26, 2024·1 cites

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song, Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan, Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu, Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao

PDF

Open Access 2 Repos

TL;DR

The PVUW 2024 Challenge on Complex Video Understanding introduces new tracks and datasets to advance pixel-level video understanding in complex, real-world scenarios, fostering development of robust video segmentation methods.

Contribution

This paper presents two new challenge tracks with datasets focusing on complex video object segmentation and natural language-guided segmentation, expanding the scope of video understanding research.

Findings

01

High participation in both challenge tracks.

02

Effective methods developed for complex video segmentation.

03

Rich datasets enabling advanced research in video understanding.

Abstract

Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Image Retrieval and Classification Techniques · Anomaly Detection Techniques and Applications

MethodsFocus