A New People-Object Interaction Dataset and NVS Benchmarks
Shuai Guo, Houqiang Zhong, Qiuwen Wang, Ziyu Chen, Yijie Gao, Jiajing, Yuan, Chenyu Zhang, Rong Xie, and Li Song

TL;DR
This paper introduces a comprehensive multi-view RGB-D dataset for human-object interaction, along with benchmarks for neural view synthesis, addressing limitations of existing datasets in view diversity, resolution, and complexity.
Contribution
The paper presents a new multi-view RGB-D dataset with synchronized views and benchmarks for NVS, enabling advanced research in human-object interaction.
Findings
Dataset includes 38 series of multi-view RGB-D videos with high resolution and detailed annotations.
Evaluation of SOTA NVS models on the dataset establishes baseline benchmarks.
The dataset facilitates research in complex lighting, multi-person interactions, and high-quality view synthesis.
Abstract
Recently, NVS in human-object interaction scenes has received increasing attention. Existing human-object interaction datasets mainly consist of static data with limited views, offering only RGB images or videos, mostly containing interactions between a single person and objects. Moreover, these datasets exhibit complexities in lighting environments, poor synchronization, and low resolution, hindering high-quality human-object interaction studies. In this paper, we introduce a new people-object interaction dataset that comprises 38 series of 30-view multi-person or single-person RGB-D video sequences, accompanied by camera parameters, foreground masks, SMPL models, some point clouds, and mesh files. Video sequences are captured by 30 Kinect Azures, uniformly surrounding the scene, each in 4K resolution 25 FPS, and lasting for 119 seconds. Meanwhile, we evaluate some SOTA NVS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition
