Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
Soyeong Kwon, Taegyeong Lee, Taehwan Kim

TL;DR
This paper introduces a zero-shot method for infinite image synthesis guided by text, leveraging large language models to ensure global coherence and local context without needing high-resolution paired datasets.
Contribution
It presents a novel approach that uses LLMs for global and local image expansion, eliminating the need for high-resolution text-image paired training data.
Findings
Outperforms baseline models quantitatively and qualitatively
Demonstrates zero-shot arbitrary-sized image generation
Effectively maintains global coherence and local context
Abstract
Text-guided image editing and generation methods have diverse real-world applications. However, text-guided infinite image synthesis faces several challenges. First, there is a lack of text-image paired datasets with high-resolution and contextual diversity. Second, expanding images based on text requires global coherence and rich local context understanding. Previous studies have mainly focused on limited categories, such as natural landscapes, and also required to train on high-resolution images with paired text. To address these challenges, we propose a novel approach utilizing Large Language Models (LLMs) for both global coherence and local context understanding, without any high-resolution text-image paired training dataset. We train the diffusion model to expand an image conditioned on global and local captions generated from the LLM and visual feature. At the inference stage,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
