NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

Maksim Kuprashevich; Grigorii Alekseenko; Irina Tolstykh; Georgii Fedorov; Bulat Suleimanov; Vladimir Dokholyan; Aleksandr Gordeev

arXiv:2507.14119·cs.CV·September 26, 2025

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh, Georgii Fedorov, Bulat Suleimanov, Vladimir Dokholyan, Aleksandr Gordeev

PDF

2 Models 3 Datasets

TL;DR

This paper introduces an automated pipeline for mining high-quality image editing triplets from generative models, enabling large-scale training data creation without human labeling, and releases a substantial dataset and a fine-tuned model.

Contribution

The authors develop a modular, automated system for extracting pixel-accurate image editing triplets, significantly reducing manual effort and enabling large-scale high-fidelity training data generation.

Findings

01

The pipeline successfully mines 720k high-quality triplets.

02

It outperforms all public alternatives in cross-dataset evaluation.

03

The approach enables large-scale training without human labeling effort.

Abstract

Recent advances in generative modeling enable image editing assistants that follow natural language instructions without additional user input. Their supervised training requires millions of triplets (original image, instruction, edited image), yet mining pixel-accurate examples is hard. Each edit must affect only prompt-specified regions, preserve stylistic coherence, respect physical plausibility, and retain visual appeal. The lack of robust automated edit-quality metrics hinders reliable automation at scale. We present an automated, modular pipeline that mines high-fidelity triplets across domains, resolutions, instruction complexities, and styles. Built on public generative models and running without human intervention, our system uses a task-tuned Gemini validator to score instruction adherence and aesthetics directly, removing any need for segmentation or grounding models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.