Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing

Shichao Ma; Yunhe Guo; Jiahao Su; Qihe Huang; Zhengyang Zhou; Yang Wang

arXiv:2508.06916·cs.CV·August 12, 2025

Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing

Shichao Ma, Yunhe Guo, Jiahao Su, Qihe Huang, Zhengyang Zhou, Yang Wang

PDF

Open Access 1 Video

TL;DR

Talk2Image is a multi-agent system that enables coherent, multi-turn image generation and editing through intention parsing, task decomposition, and collaborative refinement, improving controllability and user satisfaction.

Contribution

It introduces a novel multi-agent framework for interactive, multi-turn image editing that addresses intention drift and incoherence in existing dialogue-based systems.

Findings

01

Outperforms baselines in controllability and coherence

02

Enhances user satisfaction in iterative editing tasks

03

Enables step-by-step alignment with user intentions

Abstract

Text-to-image generation tasks have driven remarkable advances in diverse media applications, yet most focus on single-turn scenarios and struggle with iterative, multi-turn creative tasks. Recent dialogue-based systems attempt to bridge this gap, but their single-agent, sequential paradigm often causes intention drift and incoherent edits. To address these limitations, we present Talk2Image, a novel multi-agent system for interactive image generation and editing in multi-turn dialogue scenarios. Our approach integrates three key components: intention parsing from dialogue history, task decomposition and collaborative execution across specialized agents, and feedback-driven refinement based on a multi-view evaluation mechanism. Talk2Image enables step-by-step alignment with user intention and consistent image editing. Experiments demonstrate that Talk2Image outperforms existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI