RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

Qiucheng Wu; Jing Shi; Simon Jenni; Kushal Kafle; Tianyu Wang; Shiyu Chang; Handong Zhao

arXiv:2602.17558·cs.CV·February 20, 2026

RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward

Qiucheng Wu, Jing Shi, Simon Jenni, Kushal Kafle, Tianyu Wang, Shiyu Chang, Handong Zhao

PDF

Open Access

TL;DR

RetouchIQ introduces a reinforcement learning framework using a generalist reward model to enable large language model agents to perform instruction-based, high-quality, and explainable image retouching, advancing professional editing tools.

Contribution

The paper presents RetouchIQ, a novel RL-based approach with a generalist reward model for instruction-driven image editing, surpassing traditional rule-based methods.

Findings

01

Significant improvement in semantic consistency of edited images.

02

Enhanced perceptual quality over previous systems.

03

Established a new benchmark with a 190k instruction-reasoning dataset.

Abstract

Recent advances in multimodal large language models (MLLMs) have shown great potential for extending vision-language reasoning to professional tool-based image editing, enabling intuitive and creative editing. A promising direction is to use reinforcement learning (RL) to enable MLLMs to reason about and execute optimal tool-use plans within professional image-editing software. However, training remains challenging due to the lack of reliable, verifiable reward signals that can reflect the inherently subjective nature of creative editing. In this work, we introduce RetouchIQ, a framework that performs instruction-based executable image editing through MLLM agents guided by a generalist reward model. RetouchIQ interprets user-specified editing intentions and generates corresponding, executable image adjustments, bridging high-level aesthetic goals with precise parameter control. To move…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection