TL;DR
VeraRetouch is a lightweight, fully differentiable framework for multi-task photo retouching that leverages a large vision-language model and a novel dataset, enabling end-to-end training and mobile deployment.
Contribution
The paper introduces VeraRetouch, a fully differentiable multi-task retouching framework with a new large-scale dataset and reinforcement learning strategy, advancing end-to-end photo retouching.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Enables mobile deployment with a smaller model footprint.
Introduces the first million-scale professional retouching dataset.
Abstract
Reasoning photo retouching has gained significant traction, requiring models to analyze image defects, give reasoning processes, and execute precise retouching enhancements. However, existing approaches often rely on non-differentiable external software, creating optimization barriers and suffering from high parameter redundancy and limited generalization. To address these challenges, we propose VeraRetouch, a lightweight and fully differentiable framework for multi-task photo retouching. We employ a 0.5B Vision-Language Model (VLM) as the central intelligence to formulate retouching plans based on instructions and scene semantics. Furthermore, we develop a fully differentiable Retouch Renderer that replaces external tools, enabling direct end-to-end pixel-level training through decoupled control latents for lighting, global color, and specific color adjustments. To overcome data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
