Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement
Xinyue Liang, Zhinyuan Ma, Lingchen Sun, Yanjun Guo, Lei Zhang

TL;DR
Photo3D introduces a novel framework that enhances photorealistic 3D generation by combining structure-aligned multi-view synthesis with detail enhancement, leveraging GPT-4o-Image generated images and a new dataset.
Contribution
The paper presents a structure-aligned multi-view synthesis pipeline and a universal detail enhancement scheme that improves realism in 3D generation across different models.
Findings
Achieves state-of-the-art photorealistic 3D generation results.
Generalizes well across various 3D-native generation paradigms.
Effectively enhances details while maintaining structural consistency.
Abstract
Although recent 3D-native generators have made great progress in synthesizing reliable geometry, they still fall short in achieving realistic appearances. A key obstacle lies in the lack of diverse and high-quality real-world 3D assets with rich texture details, since capturing such data is intrinsically difficult due to the diverse scales of scenes, non-rigid motions of objects, and the limited precision of 3D scanners. We introduce Photo3D, a framework for advancing photorealistic 3D generation, which is driven by the image data generated by the GPT-4o-Image model. Considering that the generated images can distort 3D structures due to their lack of multi-view consistency, we design a structure-aligned multi-view synthesis pipeline and construct a detail-enhanced multi-view dataset paired with 3D geometry. Building on it, we present a realistic detail enhancement scheme that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
