Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation
Xinhao Hu, Yiyi Zhang, Liqing Zhang, Jianfu Zhang

TL;DR
This paper introduces a controllable generative augmentation framework for 3D human pose estimation that synthesizes diverse video data to improve model generalization across domains.
Contribution
It presents a novel controllable pose generation method that systematically varies poses, backgrounds, and viewpoints to enrich training data for better domain generalization.
Findings
Augmented datasets significantly improve performance on unseen scenarios.
Cross-domain data fusion enhances model robustness.
Controllable video generation effectively addresses domain gaps.
Abstract
Pedestrian motion, due to its causal nature, is strongly influenced by domain gaps arising from discrepancies between training and testing data distributions. Focusing on 3D human pose estimation, this work presents a controllable human pose generation framework that synthesizes diverse video data by systematically varying poses, backgrounds, and camera viewpoints. This generative augmentation enriches training datasets, enhances model generalization, and alleviates the limitations of existing methods in handling domain discrepancies. By leveraging both indoor/real-world and outdoor/virtual datasets, we perform cross-domain data fusion and controllable video generation to construct enriched training data, tailored to realistic deployment settings. Extensive experiments show that the augmented datasets significantly improve model performance on unseen scenarios and datasets, validating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
