T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Chieh-Yun Chen; Min Shi; Gong Zhang; Humphrey Shi

arXiv:2507.20536·cs.CV·July 30, 2025

T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation

Chieh-Yun Chen, Min Shi, Gong Zhang, Humphrey Shi

PDF

Open Access

TL;DR

T2I-Copilot is a training-free multi-agent system that improves prompt interpretation, model selection, and iterative refinement for text-to-image generation, leading to higher quality and better alignment without additional training.

Contribution

It introduces a novel multi-agent framework leveraging large language models to automate and enhance prompt engineering and image generation quality without training.

Findings

01

Achieves VQA scores comparable to commercial models.

02

Surpasses certain models in quality at lower cost.

03

Outperforms existing methods in prompt refinement and image quality.

Abstract

Text-to-Image (T2I) generative models have revolutionized content creation but remain highly sensitive to prompt phrasing, often requiring users to repeatedly refine prompts multiple times without clear feedback. While techniques such as automatic prompt engineering, controlled text embeddings, denoising, and multi-turn generation mitigate these issues, they offer limited controllability, or often necessitate additional training, restricting the generalization abilities. Thus, we introduce T2I-Copilot, a training-free multi-agent system that leverages collaboration between (Multimodal) Large Language Models to automate prompt phrasing, model selection, and iterative refinement. This approach significantly simplifies prompt engineering while enhancing generation quality and text-image alignment compared to direct generation. Specifically, T2I-Copilot consists of three agents: (1) Input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling