NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh, Georgii Fedorov, Bulat Suleimanov, Vladimir Dokholyan, Aleksandr Gordeev

TL;DR
This paper introduces an automated pipeline for mining high-quality image editing triplets from generative models, enabling large-scale training data creation without human labeling, and releases a substantial dataset and a fine-tuned model.
Contribution
The authors develop a modular, automated system for extracting pixel-accurate image editing triplets, significantly reducing manual effort and enabling large-scale high-fidelity training data generation.
Findings
The pipeline successfully mines 720k high-quality triplets.
It outperforms all public alternatives in cross-dataset evaluation.
The approach enables large-scale training without human labeling effort.
Abstract
Recent advances in generative modeling enable image editing assistants that follow natural language instructions without additional user input. Their supervised training requires millions of triplets (original image, instruction, edited image), yet mining pixel-accurate examples is hard. Each edit must affect only prompt-specified regions, preserve stylistic coherence, respect physical plausibility, and retain visual appeal. The lack of robust automated edit-quality metrics hinders reliable automation at scale. We present an automated, modular pipeline that mines high-fidelity triplets across domains, resolutions, instruction complexities, and styles. Built on public generative models and running without human intervention, our system uses a task-tuned Gemini validator to score instruction adherence and aesthetics directly, removing any need for segmentation or grounding models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
