CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator

Yuhan Pu; Hao Zheng; Ziqian Mo; Hill Zhang; Tianyi Fan; Shuhong Wu; Jiaheng Wei

arXiv:2604.03156·cs.CV·April 6, 2026

CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator

Yuhan Pu, Hao Zheng, Ziqian Mo, Hill Zhang, Tianyi Fan, Shuhong Wu, Jiaheng Wei

PDF

TL;DR

CAMEO introduces a multi-agent, feedback-driven framework for conditional image editing that enhances quality, robustness, and structural accuracy over traditional single-step methods.

Contribution

It reformulates image editing as an iterative, quality-aware process with structured feedback, improving control and consistency in complex editing tasks.

Findings

01

CAMEO achieves 20% higher win rate on average compared to state-of-the-art models.

02

The framework improves robustness and structural reliability in image editing.

03

Evaluation embedded within the editing loop enables iterative refinement and quality control.

Abstract

Conditional image editing aims to modify a source image according to textual prompts and optional reference guidance. Such editing is crucial in scenarios requiring strict structural control (i.e., anomaly insertion in driving scenes and complex human pose transformation). Despite recent advances in large-scale editing models (i.e., Seedream, Nano Banana, etc), most approaches rely on single-step generation. This paradigm often lacks explicit quality control, may introduce excessive deviation from the original image, and frequently produces structural artifacts or environment-inconsistent modifications, typically requiring manual prompt tuning to achieve acceptable results. We propose \textbf{CAMEO}, a structured multi-agent framework that reformulates conditional editing as a quality-aware, feedback-driven process rather than a one-shot generation task. CAMEO decomposes editing into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.