Pose-Aware Diffusion for 3D Generation
Zihan Zhou, Luxi Chen, Jingzhi Zhou, Yuhao Wan, Min Zhao, Baoyu Fan, Chongxuan Li

TL;DR
Pose-Aware Diffusion (PAD) is an end-to-end framework that directly synthesizes pose-aligned 3D objects from monocular depth, overcoming spatial mismatches and ambiguities in traditional methods.
Contribution
PAD introduces a novel approach that unprojects monocular depth into 3D space and explicitly enforces spatial supervision, improving pose alignment and 3D generation fidelity.
Findings
PAD achieves superior geometric alignment compared to state-of-the-art methods.
PAD produces high-fidelity, pose-aligned 3D assets.
PAD extends naturally to compositional scene reconstruction.
Abstract
Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
