ProcGen3D: Learning Neural Procedural Graph Representations for Image-to-3D Reconstruction

Xinyi Zhang; Daoyi Gao; Naiqi Li; Angela Dai

arXiv:2511.07142·cs.CV·November 11, 2025

ProcGen3D: Learning Neural Procedural Graph Representations for Image-to-3D Reconstruction

Xinyi Zhang, Daoyi Gao, Naiqi Li, Angela Dai

PDF

Open Access 3 Reviews

TL;DR

ProcGen3D introduces a neural procedural graph-based method for 3D reconstruction from images, leveraging transformer models and MCTS to produce detailed, domain-specific 3D assets that outperform existing techniques.

Contribution

The paper presents a novel graph-based procedural representation and a transformer-based generative model with MCTS-guided sampling for improved image-to-3D reconstruction.

Findings

01

Outperforms state-of-the-art 3D generative methods

02

Enables better generalization to real-world images

03

Effective across diverse object categories like cacti, trees, and bridges

Abstract

We introduce ProcGen3D, a new approach for 3D content creation by generating procedural graph abstractions of 3D objects, which can then be decoded into rich, complex 3D assets. Inspired by the prevalent use of procedural generators in production 3D applications, we propose a sequentialized, graph-based procedural graph representation for 3D assets. We use this to learn to approximate the landscape of a procedural generator for image-based 3D reconstruction. We employ edge-based tokenization to encode the procedural graphs, and train a transformer prior to predict the next token conditioned on an input RGB image. Crucially, to enable better alignment of our generated outputs to an input image, we incorporate Monte Carlo Tree Search (MCTS) guided sampling into our generation process, steering output procedural graphs towards more image-faithful reconstructions. Our approach is applicable…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. Using a transformer model to learn the procedural graph generation is interesting and allows more compact representation. 2. MCTS-guided sampling is novel and experimentally performs better for complex local geometry. 3. The approach seems to generalize for real images based on qualitative evaluations.

Weaknesses

1. The approach is limited to a single-view image which is not sufficient for capturing the full geometry of the object. 2. The overall generation will be limited by the limitations of autoregressive models, i.e., overall structure will be limited by the order of generation and errors can easily propagate. There is no discussion in the paper on the robustness of the generation process to small early mistakes. 3. The overall dataset is very limited with just three categories. The set of real-imag

Reviewer 02Rating 4Confidence 3

Strengths

- As far as I know, this is the first work on image-conditioned 3D generation using procedural graphs as the 3D representation. Procedural graphs have many advantages, such as generality and being able to represent details well and succintly. - The construction of the Transformer-based generative model is sound. - Experiments show good qualitative results.

Weaknesses

The weaknesses of the paper fall into two main points. First, I think that the experiments are incomplete. - The model is trained separately on each category of objects. This calls the experimental results' generality into question. Was training done on a more diverse dataset? - Wonder3D and TRELLIS were both trained on a diverse set of objects. Therefore, comparing to them in a category-specific manner is not quite fair to them. It would strengthen the paper to include another category-specific

Reviewer 03Rating 4Confidence 2

Strengths

Innovative Representation: The idea of learning procedural graphs as the latent 3D representation is novel and conceptually elegant. It bridges neural generative modeling and procedural graphics in a meaningful way. Compact & Interpretable Outputs: Procedural graphs are lightweight and structured, providing interpretable intermediate representations rather than opaque neural fields or dense meshes. Effective Image Alignment: The use of MCTS-guided sampling for test-time refinement is an origin

Weaknesses

Missing comparison: The paper doesn't compare to DI-PCG, which I believe is a very important, relevant baseline Limited real world examples: Although the method claims to generalize to real world examples, number of results for the same is limited. Limited Scope of Objects: The evaluated categories—trees, cacti, bridges—are all graph-structured and hierarchical. It’s unclear how well the method extends to more complex or amorphous shapes (e.g., vehicles, furniture). Computational Overhead of

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques