ACE++: Instruction-Based Image Creation and Editing via Context-Aware   Content Filling

Chaojie Mao; Jingfeng Zhang; Yulin Pan; Zeyinzi Jiang and; Zhen Han; Yu Liu; Jingren Zhou

arXiv:2501.02487·cs.CV·January 16, 2025

ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content Filling

Chaojie Mao, Jingfeng Zhang, Yulin Pan, Zeyinzi Jiang and, Zhen Han, Yu Liu, Jingren Zhou

PDF

Open Access 1 Models

TL;DR

ACE++ is a versatile instruction-based diffusion framework that enhances image creation and editing by extending input paradigms and employing a two-stage training scheme, achieving high-quality results with flexible finetuning options.

Contribution

The paper introduces ACE++, a novel framework that generalizes instruction-based image editing and generation, with an efficient two-stage training process and comprehensive model offerings.

Findings

01

Superior image quality in generated and edited images.

02

Effective prompt following across diverse tasks.

03

Flexible finetuning options for different scenarios.

Abstract

We report ACE++, an instruction-based diffusion framework that tackles various image generation and editing tasks. Inspired by the input format for the inpainting task proposed by FLUX.1-Fill-dev, we improve the Long-context Condition Unit (LCU) introduced in ACE and extend this input paradigm to any editing and generation tasks. To take full advantage of image generative priors, we develop a two-stage training scheme to minimize the efforts of finetuning powerful text-to-image diffusion models like FLUX.1-dev. In the first stage, we pre-train the model using task data with the 0-ref tasks from the text-to-image model. There are many models in the community based on the post-training of text-to-image foundational models that meet this training paradigm of the first stage. For example, FLUX.1-Fill-dev deals primarily with painting tasks and can be used as an initialization to accelerate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ali-vilab/ACE_Plus
model· 141 dl· ♡ 304
141 dl♡ 304

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Analysis and Summarization

MethodsSparse Evolutionary Training · Diffusion · Inpainting