PhotoAgent: Agentic Photo Editing with Exploratory Visual Aesthetic Planning
Mingde Yao, Zhiyuan You, King-Man Tam, Menglu Wang, Tianfan Xue

TL;DR
PhotoAgent is an autonomous photo editing system that plans and executes multi-step aesthetic edits using visual feedback, reducing user effort and improving image quality.
Contribution
It introduces a decision-making framework for autonomous image editing with aesthetic planning and a new benchmark for evaluation.
Findings
PhotoAgent outperforms baseline methods in instruction adherence.
It achieves higher visual quality in edited images.
The system effectively plans multi-step edits using visual feedback.
Abstract
With the recent fast development of generative models, instruction-based image editing has shown great potential in generating high-quality images. However, the quality of editing highly depends on carefully designed instructions, placing the burden of task decomposition and sequencing entirely on the user. To achieve autonomous image editing, we present PhotoAgent, a system that advances image editing through explicit aesthetic planning. Specifically, PhotoAgent formulates autonomous image editing as a long-horizon decision-making problem. It reasons over user aesthetic intent, plans multi-step editing actions via tree search, and iteratively refines results through closed-loop execution with memory and visual feedback, without requiring step-by-step user prompts. To support reliable evaluation in real-world scenarios, we introduce UGC-Edit, an aesthetic evaluation benchmark consisting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
