Video Object Segmentation and Tracking: A Survey
Rui Yao, Guosheng Lin, Shixiong Xia, Jiaqi Zhao, and Yong Zhou

TL;DR
This survey comprehensively reviews state-of-the-art methods in video object segmentation and tracking, highlighting their challenges, classifications, datasets, evaluation metrics, and future research directions.
Contribution
It provides a hierarchical classification of VOST methods, detailed analysis of their technical features, and insights into datasets and evaluation metrics, offering a valuable overview for researchers.
Findings
Classified VOST methods into categories like unsupervised, semi-supervised, and interactive.
Summarized characteristics of video datasets and evaluation metrics.
Identified future research directions in VOST.
Abstract
Object segmentation and object tracking are fundamental research area in the computer vision community. These two topics are diffcult to handle some common challenges, such as occlusion, deformation, motion blur, and scale variation. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity. And the latter suffers from difficulties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of video object segmentation and tracking (VOST) can overcome their respective difficulties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high definition video compression, human computer interaction, and autonomous vehicles. This article aims to provide a comprehensive review of the state-of-the-art tracking methods, and classify these methods into different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
