Pose-Aware Diffusion for 3D Generation

Zihan Zhou; Luxi Chen; Jingzhi Zhou; Yuhao Wan; Min Zhao; Baoyu Fan; Chongxuan Li

arXiv:2605.00345·cs.CV·May 4, 2026

Pose-Aware Diffusion for 3D Generation

Zihan Zhou, Luxi Chen, Jingzhi Zhou, Yuhao Wan, Min Zhao, Baoyu Fan, Chongxuan Li

PDF

TL;DR

Pose-Aware Diffusion (PAD) is an end-to-end framework that directly synthesizes pose-aligned 3D objects from monocular depth, overcoming spatial mismatches and ambiguities in traditional methods.

Contribution

PAD introduces a novel approach that unprojects monocular depth into 3D space and explicitly enforces spatial supervision, improving pose alignment and 3D generation fidelity.

Findings

01

PAD achieves superior geometric alignment compared to state-of-the-art methods.

02

PAD produces high-fidelity, pose-aligned 3D assets.

03

PAD extends naturally to compositional scene reconstruction.

Abstract

Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.