ACE: All-round Creator and Editor Following Instructions via Diffusion   Transformer

Zhen Han; Zeyinzi Jiang; Yulin Pan; Jingfeng Zhang; Chaojie Mao,; Chenwei Xie; Yu Liu; Jingren Zhou

arXiv:2410.00086·cs.CV·November 6, 2024

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Zhen Han, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang, Chaojie Mao,, Chenwei Xie, Yu Liu, Jingren Zhou

PDF

Open Access 4 Models 1 Video

TL;DR

ACE introduces a unified diffusion model capable of handling diverse visual generation and editing tasks using a novel Transformer-based approach and a comprehensive benchmark, enabling multi-modal interactions.

Contribution

The paper presents ACE, a novel Transformer-based diffusion model with a unified condition format, and a new data collection method for multi-task visual generation and editing.

Findings

01

ACE achieves comparable performance to expert models across tasks.

02

The model demonstrates superior results in visual generation benchmarks.

03

It enables multi-modal image creation and editing with a single unified system.

Abstract

Diffusion models have emerged as a powerful generative technology and have been found to be applicable in various scenarios. Most existing foundational diffusion models are primarily designed for text-guided visual generation and do not support multi-modal conditions, which are essential for many visual editing tasks. This limitation prevents these foundational diffusion models from serving as a unified model in the field of visual generation, like GPT-4 in the natural language processing field. In this work, we propose ACE, an All-round Creator and Editor, which achieves comparable performance compared to those expert models in a wide range of visual generation tasks. To achieve this goal, we first introduce a unified condition format termed Long-context Condition Unit (LCU), and propose a novel Transformer-based diffusion model that uses LCU as input, aiming for joint training across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer· slideslive

Taxonomy

TopicsInnovative Teaching and Learning Methods

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding