MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human   Captures with Richer Annotations for 3D Human Digitization

Chenghong Li; Hongjie Liao; Yihao Zhi; Xihe Yang; Zhengwentai Sun,; Jiahao Chang; Shuguang Cui; Xiaoguang Han

arXiv:2505.01838·cs.CV·May 6, 2025

MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization

Chenghong Li, Hongjie Liao, Yihao Zhi, Xihe Yang, Zhengwentai Sun,, Jiahao Chang, Shuguang Cui, Xiaoguang Han

PDF

Open Access

TL;DR

MVHumanNet++ is a comprehensive large-scale dataset of multi-view daily human captures with extensive annotations, designed to advance 3D human digitization and related tasks.

Contribution

The paper introduces MVHumanNet++, the largest-scale 3D human dataset with diverse identities, clothing, and rich annotations, enabling new research opportunities in human-centric 3D vision.

Findings

01

Demonstrated performance improvements in 2D and 3D tasks using the dataset.

02

Showcased the dataset's utility across various human-centric visual applications.

03

Provided extensive annotations including masks, keypoints, SMPL parameters, and textual descriptions.

Abstract

In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while significant progress has been achieved in object-centric tasks through large-scale datasets like Objaverse and MVImgNet, human-centric tasks have seen limited advancement, largely due to the absence of a comparable large-scale human dataset. To bridge this gap, we present MVHumanNet++, a dataset that comprises multi-view human action sequences of 4,500 human identities. The primary focus of our work is on collecting human data that features a large number of diverse identities and everyday clothing using multi-view human capture systems, which facilitates easily scalable data collection. Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million frames with extensive annotations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · 3D Shape Modeling and Analysis

MethodsFocus