Real-time Human-Centric Segmentation for Complex Video Scenes
Ran Yu, Chenyu Tian, Weihao Xia, Xinyuan Zhao, Haoqian Wang, Yujiu, Yang

TL;DR
This paper introduces HVISNet, a real-time framework for comprehensive human segmentation and tracking in complex videos, along with a new benchmark dataset, demonstrating superior accuracy especially in occluded scenarios.
Contribution
The paper presents HVISNet, a novel one-stage detector-based framework for segmenting and tracking all humans in videos, and introduces the HVIS benchmark dataset for complex scenes.
Findings
HVISNet outperforms existing methods in accuracy at 30 FPS.
Inner Center Sampling improves segmentation accuracy, especially in occlusions.
The HVIS dataset contains 1447 human masks in 805 videos for complex scene evaluation.
Abstract
Most existing video tasks related to "human" focus on the segmentation of salient humans, ignoring the unspecified others in the video. Few studies have focused on segmenting and tracking all humans in a complex video, including pedestrians and humans of other states (e.g., seated, riding, or occluded). In this paper, we propose a novel framework, abbreviated as HVISNet, that segments and tracks all presented people in given videos based on a one-stage detector. To better evaluate complex scenes, we offer a new benchmark called HVIS (Human Video Instance Segmentation), which comprises 1447 human instance masks in 805 high-resolution videos in diverse scenes. Extensive experiments show that our proposed HVISNet outperforms the state-of-the-art methods in terms of accuracy at a real-time inference speed (30 FPS), especially on complex video scenes. We also notice that using the center of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition
