Mini-DALLE3: Interactive Text to Image by Prompting Large Language   Models

Zeqiang Lai; Xizhou Zhu; Jifeng Dai; Yu Qiao; Wenhai Wang

arXiv:2310.07653·cs.AI·October 16, 2023·5 cites

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang

PDF

Open Access 1 Repo

TL;DR

Mini-DALLE3 introduces an interactive text-to-image system that enables natural language-based image generation, editing, and refinement by leveraging large language models and existing T2I models without additional training.

Contribution

The paper presents a simple prompting-based approach to enable interactive text-to-image capabilities in existing LLMs and T2I models, enhancing human-machine interaction.

Findings

01

Effective in various scenarios with different LLMs

02

Low-cost approach without additional training

03

Maintains LLMs' core capabilities

Abstract

The revolution of artificial intelligence content generation has been rapidly accelerated with the booming text-to-image (T2I) diffusion models. Within just two years of development, it was unprecedentedly of high-quality, diversity, and creativity that the state-of-the-art models could generate. However, a prevalent limitation persists in the effective communication with these popular T2I models, such as Stable Diffusion, using natural language descriptions. This typically makes an engaging image hard to obtain without expertise in prompt engineering with complex word compositions, magic tags, and annotations. Inspired by the recently released DALLE3 - a T2I model directly built-in ChatGPT that talks human language, we revisit the existing T2I systems endeavoring to align human intent and introduce a new task - interactive text to image (iT2I), where people can interact with LLM for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zeqiang-Lai/MiniDALLE-3
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsDiffusion · ALIGN