Flow-based Video Segmentation for Human Head and Shoulders
Zijian Kuang, Xinran Tie

TL;DR
This paper introduces FUNet, a flow-based encoder-decoder network that combines optical flow and CNNs for real-time human head and shoulders video segmentation, addressing motion blur challenges in videoconferencing.
Contribution
The paper presents a novel flow-based network architecture and a new dataset for real-time head and shoulders video segmentation, improving robustness against motion blur.
Findings
Effective real-time segmentation under motion blur
Combines optical flow with CNNs for robustness
Provides a new dataset for benchmarking
Abstract
Video segmentation for the human head and shoulders is essential in creating elegant media for videoconferencing and virtual reality applications. The main challenge is to process high-quality background subtraction in a real-time manner and address the segmentation issues under motion blurs, e.g., shaking the head or waving hands during conference video. To overcome the motion blur problem in video segmentation, we propose a novel flow-based encoder-decoder network (FUNet) that combines both traditional Horn-Schunck optical-flow estimation technique and convolutional neural networks to perform robust real-time video segmentation. We also introduce a video and image segmentation dataset: ConferenceVideoSegmentationDataset. Code and pre-trained models are available on our GitHub repository: \url{https://github.com/kuangzijian/Flow-Based-Video-Matting}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Human Pose and Action Recognition · Advanced Vision and Imaging
