ScribbleEdit: Synthetic Data for Image Editing with Scribbles and Text

Anya Ji; George Ma; T\'ea Wright; Yiming Zhang; David M. Chan; Alane Suhr; Somayeh Sojoudi

arXiv:2605.01135·cs.CV·May 6, 2026

ScribbleEdit: Synthetic Data for Image Editing with Scribbles and Text

Anya Ji, George Ma, T\'ea Wright, Yiming Zhang, David M. Chan, Alane Suhr, Somayeh Sojoudi

PDF

TL;DR

ScribbleEdit introduces a synthetic dataset combining scribbles and text instructions to enhance precise, controllable image editing by training and fine-tuning multimodal models.

Contribution

The paper presents a large-scale synthetic dataset that enables models to interpret combined scribble and text inputs for improved image editing control.

Findings

01

Fine-tuning on ScribbleEdit improves spatial alignment in edits.

02

Off-the-shelf models struggle with scribble inputs without training.

03

Synthetic data enhances model capability for detailed image modifications.

Abstract

Recent progress in generative models has significantly advanced image editing capabilities, yet precise and intuitive user control remains difficult. Specifically, users often struggle to communicate both exact spatial layouts and specific semantic details simultaneously. While natural language instructions effectively convey high-level semantics like texture and color, they lack spatial specificity. Conversely, freehand scribbles provide rough spatial boundaries but cannot express detailed visual attributes. Consequently, achieving precise control requires combining both modalities. However, existing models struggle to jointly interpret abstract scribbles alongside text due to a lack of specialized training data. In this work, we introduce ScribbleEdit, a large-scale synthetic dataset designed to bridge this gap by combining natural language instructions with freehand scribble inputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.