Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang

TL;DR
Mini-DALLE3 introduces an interactive text-to-image system that enables natural language-based image generation, editing, and refinement by leveraging large language models and existing T2I models without additional training.
Contribution
The paper presents a simple prompting-based approach to enable interactive text-to-image capabilities in existing LLMs and T2I models, enhancing human-machine interaction.
Findings
Effective in various scenarios with different LLMs
Low-cost approach without additional training
Maintains LLMs' core capabilities
Abstract
The revolution of artificial intelligence content generation has been rapidly accelerated with the booming text-to-image (T2I) diffusion models. Within just two years of development, it was unprecedentedly of high-quality, diversity, and creativity that the state-of-the-art models could generate. However, a prevalent limitation persists in the effective communication with these popular T2I models, such as Stable Diffusion, using natural language descriptions. This typically makes an engaging image hard to obtain without expertise in prompt engineering with complex word compositions, magic tags, and annotations. Inspired by the recently released DALLE3 - a T2I model directly built-in ChatGPT that talks human language, we revisit the existing T2I systems endeavoring to align human intent and introduce a new task - interactive text to image (iT2I), where people can interact with LLM for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDiffusion · ALIGN
