PP-HumanSeg: Connectivity-Aware Portrait Segmentation with a Large-Scale   Teleconferencing Video Dataset

Lutao Chu; Yi Liu; Zewu Wu; Shiyu Tang; Guowei Chen; Yuying Hao,; Juncai Peng; Zhiliang Yu; Zeyu Chen; Baohua Lai; Haoyi Xiong

arXiv:2112.07146·cs.CV·December 15, 2021

PP-HumanSeg: Connectivity-Aware Portrait Segmentation with a Large-Scale Teleconferencing Video Dataset

Lutao Chu, Yi Liu, Zewu Wu, Shiyu Tang, Guowei Chen, Yuying Hao,, Juncai Peng, Zhiliang Yu, Zeyu Chen, Baohua Lai, Haoyi Xiong

PDF

Open Access 1 Repo

TL;DR

This paper introduces PP-HumanSeg, a large-scale video portrait dataset and a novel connectivity-aware learning method, enabling real-time, high-quality portrait segmentation for video conferencing applications.

Contribution

It provides the first large-scale video portrait dataset and proposes a semantic connectivity-aware loss with an ultra-lightweight model for improved segmentation.

Findings

01

SCL improves segmentation quality and connectivity.

02

The dataset contains 14K labeled frames from 291 videos.

03

The model achieves a good balance of accuracy and inference speed.

Abstract

As the COVID-19 pandemic rampages across the world, the demands of video conferencing surge. To this end, real-time portrait segmentation becomes a popular feature to replace backgrounds of conferencing participants. While feature-rich datasets, models and algorithms have been offered for segmentation that extract body postures from life scenes, portrait segmentation has yet not been well covered in a video conferencing context. To facilitate the progress in this field, we introduce an open-source solution named PP-HumanSeg. This work is the first to construct a large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K fine-labeled frames and extensions to multi-camera teleconferencing. Furthermore, we propose a novel Semantic Connectivity-aware Learning (SCL) for semantic segmentation, which introduces a semantic connectivity-aware loss to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PaddlePaddle/PaddleSeg
paddleOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTelemedicine and Telehealth Implementation · COVID-19 diagnosis using AI · Human Pose and Action Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings