VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Yash Garg; Saketh Bachu; Arindam Dutta; Rohit Lal; Sarosij Bose; Calvin-Khang Ta; M. Salman Asif; Amit Roy-Chowdhury

arXiv:2508.06757·cs.CV·September 18, 2025

VOccl3D: A Video Benchmark Dataset for 3D Human Pose and Shape Estimation under real Occlusions

Yash Garg, Saketh Bachu, Arindam Dutta, Rohit Lal, Sarosij Bose, Calvin-Khang Ta, M. Salman Asif, Amit Roy-Chowdhury

PDF

TL;DR

VOccl3D is a new video dataset with realistic occlusions for benchmarking 3D human pose and shape estimation, enabling improved method development under challenging real-world conditions.

Contribution

The paper introduces VOccl3D, a realistic occlusion dataset for 3D human pose estimation, and demonstrates its utility by fine-tuning and improving state-of-the-art methods.

Findings

01

Fine-tuning on VOccl3D improves performance of HPS methods.

02

Enhanced human detection under occlusion with fine-tuned YOLO11.

03

VOccl3D provides a realistic benchmark for occlusion robustness.

Abstract

Human pose and shape (HPS) estimation methods have been extensively studied, with many demonstrating high zero-shot performance on in-the-wild images and videos. However, these methods often struggle in challenging scenarios involving complex human poses or significant occlusions. Although some studies address 3D human pose estimation under occlusion, they typically evaluate performance on datasets that lack realistic or substantial occlusions, e.g., most existing datasets introduce occlusions with random patches over the human or clipart-style overlays, which may not reflect real-world challenges. To bridge this gap in realistic occlusion datasets, we introduce a novel benchmark dataset, VOccl3D, a Video-based human Occlusion dataset with 3D body pose and shape annotations. Inspired by works such as AGORA and BEDLAM, we constructed this dataset using advanced computer graphics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.