UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal   Transformer for Image Generation

Lunhao Duan; Shanshan Zhao; Wenjun Yan; Yinglun Li; Qing-Guo Chen,; Zhao Xu; Weihua Luo; Kaifu Zhang; Mingming Gong; Gui-Song Xia

arXiv:2412.18928·cs.CV·December 30, 2024

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation

Lunhao Duan, Shanshan Zhao, Wenjun Yan, Yinglun Li, Qing-Guo Chen,, Zhao Xu, Weihua Luo, Kaifu Zhang, Mingming Gong, Gui-Song Xia

PDF

Open Access 1 Models

TL;DR

This paper introduces UNIC-Adapter, a unified multi-modal transformer framework that enables flexible, controllable image generation from diverse inputs without needing multiple specialized models.

Contribution

The paper presents a novel unified adapter built on a multi-modal diffusion transformer, allowing controllable image synthesis across various conditions within a single model.

Findings

01

Effective control over pixel-level layouts and styles

02

Versatile performance across multiple image generation tasks

03

Outperforms specialized models in controllability

Abstract

Recently, text-to-image generation models have achieved remarkable advancements, particularly with diffusion models facilitating high-quality image synthesis from textual descriptions. However, these models often struggle with achieving precise control over pixel-level layouts, object appearances, and global styles when using text prompts alone. To mitigate this issue, previous works introduce conditional images as auxiliary inputs for image generation, enhancing control but typically necessitating specialized models tailored to different types of reference inputs. In this paper, we explore a new approach to unify controllable generation within a single framework. Specifically, we propose the unified image-instruction adapter (UNIC-Adapter) built on the Multi-Modal-Diffusion Transformer architecture, to enable flexible and controllable generation across diverse conditions without the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
AIDC-AI/UNIC-Adapter
model· ♡ 6
♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Medical Image Segmentation Techniques

MethodsByte Pair Encoding · Linear Layer · Absolute Position Encodings · Dropout · Softmax · Attention Is All You Need · Dense Connections · Residual Connection · Diffusion · Multi-Head Attention