OpenHuman4D: Open-Vocabulary 4D Human Parsing

Keito Suzuki; Bang Du; Runfa Blark Li; Kunyao Chen; Lei Wang; Peng Liu; Ning Bi; Truong Nguyen

arXiv:2507.09880·cs.CV·July 29, 2025

OpenHuman4D: Open-Vocabulary 4D Human Parsing

Keito Suzuki, Bang Du, Runfa Blark Li, Kunyao Chen, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen

PDF

TL;DR

This paper introduces a novel 4D human parsing framework that enables open-vocabulary segmentation in dynamic videos, significantly reducing inference time and handling unseen classes effectively.

Contribution

It extends open-vocabulary 3D human parsing to 4D videos with innovations in tracking, validation, and embedding fusion, enabling efficient and flexible human part segmentation.

Findings

01

Achieves up to 93.3% acceleration over previous methods

02

Supports open-vocabulary 4D human parsing

03

Demonstrates effectiveness on 4D human-centric datasets

Abstract

Understanding dynamic 3D human representation has become increasingly critical in virtual and extended reality applications. However, existing human part segmentation methods are constrained by reliance on closed-set datasets and prolonged inference times, which significantly restrict their applicability. In this paper, we introduce the first 4D human parsing framework that simultaneously addresses these challenges by reducing the inference time and introducing open-vocabulary capabilities. Building upon state-of-the-art open-vocabulary 3D human parsing techniques, our approach extends the support to 4D human-centric video with three key innovations: 1) We adopt mask-based video object tracking to efficiently establish spatial and temporal correspondences, avoiding the necessity of segmenting all frames. 2) A novel Mask Validation module is designed to manage new target identification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.