SGEdit: Bridging LLM with Text2Image Generative Model for Scene   Graph-based Image Editing

Zhiyuan Zhang; DongDong Chen; Jing Liao

arXiv:2410.11815·cs.CV·October 16, 2024

SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing

Zhiyuan Zhang, DongDong Chen, Jing Liao

PDF

Open Access

TL;DR

This paper presents SGEdit, a novel framework that combines large language models with Text2Image generative models to enable precise, flexible, and scene-aware image editing using scene graphs.

Contribution

It introduces a new scene graph-based image editing approach that leverages LLMs for scene parsing and editing control, enhancing editing accuracy and scene coherence.

Findings

01

Outperforms existing methods in editing precision

02

Achieves higher scene aesthetic quality

03

Enables object-level modifications with fine-grained control

Abstract

Scene graphs offer a structured, hierarchical representation of images, with nodes and edges symbolizing objects and the relationships among them. It can serve as a natural interface for image editing, dramatically improving precision and flexibility. Leveraging this benefit, we introduce a new framework that integrates large language model (LLM) with Text2Image generative model for scene graph-based image editing. This integration enables precise modifications at the object level and creative recomposition of scenes without compromising overall image integrity. Our approach involves two primary stages: 1) Utilizing a LLM-driven scene parser, we construct an image's scene graph, capturing key objects and their interrelationships, as well as parsing fine-grained attributes such as object masks and descriptions. These annotations facilitate concept learning with a fine-tuned diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Semantic Web and Ontologies

MethodsDiffusion