SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with   Large Language Models

Shanshan Zhong; Zhongzhan Huang; Wushao Wen; Jinghui Qin; Liang Lin

arXiv:2305.05189·cs.CL·November 30, 2023·2 cites

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang Lin

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces SUR-adapter, a parameter-efficient method that leverages large language models and a new dataset to improve diffusion models' understanding of narrative prompts for higher-quality text-to-image generation.

Contribution

The paper proposes SUR-adapter, a novel fine-tuning approach that enhances diffusion models' semantic understanding using knowledge distillation from large language models and a new annotated dataset.

Findings

01

Improved image quality with narrative prompts

02

Enhanced semantic understanding in diffusion models

03

Effective knowledge transfer from LLMs to diffusion models

Abstract

Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense reasoning in existing models when the input prompts are concise narrative, resulting in low-quality image generation. To improve the capacities for narrative prompts, we propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models. To reach this goal, we first collect and annotate a new dataset SURD which consists of more than 57,000 semantically corrected multi-modal samples. Each sample contains a simple narrative prompt, a complex keyword-based prompt, and a high-quality image. Then, we align the semantic representation of narrative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Qrange-group/SUR-adapter
pytorchOfficial

Models

🤗
zhongshsh/SUR-adapter
model

Datasets

zhongshsh/SURD
dataset· 41 dl
41 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Adapter · Knowledge Distillation · ALIGN