DreamOmni3: Scribble-based Editing and Generation

Bin Xia; Bohao Peng; Jiyang Liu; Sitong Wu; Jingyao Li; Junjia Huang; Xu Zhao; Yitong Wang; Ruihang Chu; Bei Yu; Jiaya Jia

arXiv:2512.22525·cs.CV·December 30, 2025

DreamOmni3: Scribble-based Editing and Generation

Bin Xia, Bohao Peng, Jiyang Liu, Sitong Wu, Jingyao Li, Junjia Huang, Xu Zhao, Yitong Wang, Ruihang Chu, Bei Yu, Jiaya Jia

PDF

Open Access

TL;DR

DreamOmni3 introduces a flexible scribble-based editing and generation framework that enhances visual editing precision by combining user sketches, images, and instructions, surpassing traditional text-only models.

Contribution

It presents a novel data synthesis pipeline and a joint input scheme for complex graphical edits, along with comprehensive benchmarks for scribble-based visual editing and generation.

Findings

01

Outperforms existing models in scribble-based editing and generation tasks.

02

Effective handling of complex edits with multiple scribbles and instructions.

03

Public release of models and code to facilitate further research.

Abstract

Recently unified generation and editing models have achieved remarkable success with their impressive performance. These models rely mainly on text prompts for instruction-based editing and generation, but language often fails to capture users intended edit locations and fine-grained visual details. To this end, we propose two tasks: scribble-based editing and generation, that enables more flexible creation on graphical user interface (GUI) combining user textual, images, and freehand sketches. We introduce DreamOmni3, tackling two challenges: data creation and framework design. Our data synthesis pipeline includes two parts: scribble-based editing and generation. For scribble-based editing, we define four tasks: scribble and instruction-based editing, scribble and multimodal instruction-based editing, image fusion, and doodle editing. Based on DreamOmni2 dataset, we extract editable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques