TL;DR
This paper introduces a geometry-conditioned diffusion model called Pose-LDM that synthesizes occluded human images from skeletal keypoints, improving in-bed pose estimation under blanket occlusion without requiring paired supervision.
Contribution
The work proposes Pose-LDM, a novel pose-conditioned latent diffusion model that generates blanket-covered images directly from skeletal keypoints, enhancing occlusion robustness.
Findings
Pose-LDM achieves the highest localization accuracy under severe occlusion.
It maintains detection performance comparable to paired diffusion models.
The approach approaches fully supervised training performance without needing paired data.
Abstract
Robust in-bed human pose estimation under blanket occlusion remains challenging due to the scarcity of reliable labeled training data for heavily covered poses. Existing approaches rely on multi-modal sensing or image-to-image translation frameworks that remain conditioned on visible source imagery, limiting scalability and pose diversity. In this work, we reformulate occlusion-aware augmentation as a geometry-conditioned generative modeling task. We conduct a systematic comparison of deterministic masking, unpaired translation, paired diffusion-based translation, and a proposed pose-conditioned Latent Diffusion Model (Pose-LDM). Unlike image-guided methods, Pose-LDM synthesizes blanket-covered images directly from skeletal keypoints, eliminating dependence on paired supervision and pixel-level source-image conditioning while enabling generation from arbitrary pose inputs. All…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
