TL;DR
Graph-PiT introduces a graph-based prior to explicitly model and improve the structural coherence of part-based image synthesis, leveraging a hierarchical graph neural network and relation-aware losses.
Contribution
It proposes a novel framework that models spatial and semantic relationships among visual parts using graph neural networks, enhancing structural integrity in image generation.
Findings
Improves structural coherence over vanilla PiT in synthetic domains.
Enhances transferability to real web images.
Explicit relational reasoning is crucial for enforcing adjacency constraints.
Abstract
Achieving fine-grained and structurally sound controllability is a cornerstone of advanced visual generation. Existing part-based frameworks treat user-provided parts as an unordered set and therefore ignore their intrinsic spatial and semantic relationships, which often results in compositions that lack structural integrity. To bridge this gap, we propose Graph-PiT, a framework that explicitly models the structural dependencies of visual components using a graph prior. Specifically, we represent visual parts as nodes and their spatial-semantic relationships as edges. At the heart of our method is a Hierarchical Graph Neural Network (HGNN) module that performs bidirectional message passing between coarse-grained part-level super-nodes and fine-grained IP+ token sub-nodes, refining part embeddings before they enter the generative pipeline. We also introduce a graph Laplacian smoothness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
