TL;DR
This paper introduces a novel model that learns canonical graph representations to improve scene graph to image generation, especially for complex scenes, demonstrating better performance and robustness across multiple benchmarks.
Contribution
The work presents a new approach that captures semantic equivalence in scene graphs by learning canonical representations, enhancing image generation quality for complex scenes.
Findings
Improved image generation on large scene graphs
Robustness to noise in input scene graphs
Better generalization on semantically equivalent graphs
Abstract
Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. We present a novel model that addresses these issues by learning canonical graph representations from the data, resulting in improved image generation for complex visual scenes. Our model demonstrates improved empirical performance on large scene graphs, robustness to noise in the input scene graph, and generalization on semantically equivalent graphs. Finally, we show improved performance of the model on three different benchmarks:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
