InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement   Learning

Tiancheng Li; Jinxiu Liu; Huajun Chen; Qi Liu

arXiv:2406.09973·cs.CV·June 17, 2024

InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning

Tiancheng Li, Jinxiu Liu, Huajun Chen, Qi Liu

PDF

Open Access

TL;DR

InstructRL4Pix introduces a reinforcement learning approach to train diffusion models for precise, instruction-guided image editing, overcoming dataset limitations and improving localization of editing regions in complex images.

Contribution

The paper presents a novel reinforcement learning framework that guides diffusion models using attention maps for accurate image editing based on natural language commands.

Findings

01

Outperforms traditional dataset-based methods in image editing accuracy

02

Effectively localizes editing regions in complex images

03

Achieves high-quality object insertion, removal, and transformation

Abstract

Instruction-based image editing has made a great process in using natural human language to manipulate the visual content of images. However, existing models are limited by the quality of the dataset and cannot accurately localize editing regions in images with complex object relationships. In this paper, we propose Reinforcement Learning Guided Image Editing Method(InstructRL4Pix) to train a diffusion model to generate images that are guided by the attention maps of the target object. Our method maximizes the output of the reward model by calculating the distance between attention maps as a reward function and fine-tuning the diffusion model using proximal policy optimization (PPO). We evaluate our model in object insertion, removal, replacement, and transformation. Experimental results show that InstructRL4Pix breaks through the limitations of traditional datasets and uses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDiffusion