SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing
Zhiyuan Zhang, DongDong Chen, Jing Liao

TL;DR
This paper presents SGEdit, a novel framework that combines large language models with Text2Image generative models to enable precise, flexible, and scene-aware image editing using scene graphs.
Contribution
It introduces a new scene graph-based image editing approach that leverages LLMs for scene parsing and editing control, enhancing editing accuracy and scene coherence.
Findings
Outperforms existing methods in editing precision
Achieves higher scene aesthetic quality
Enables object-level modifications with fine-grained control
Abstract
Scene graphs offer a structured, hierarchical representation of images, with nodes and edges symbolizing objects and the relationships among them. It can serve as a natural interface for image editing, dramatically improving precision and flexibility. Leveraging this benefit, we introduce a new framework that integrates large language model (LLM) with Text2Image generative model for scene graph-based image editing. This integration enables precise modifications at the object level and creative recomposition of scenes without compromising overall image integrity. Our approach involves two primary stages: 1) Utilizing a LLM-driven scene parser, we construct an image's scene graph, capturing key objects and their interrelationships, as well as parsing fine-grained attributes such as object masks and descriptions. These annotations facilitate concept learning with a fine-tuned diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Semantic Web and Ontologies
MethodsDiffusion
