DreamOmni3: Scribble-based Editing and Generation
Bin Xia, Bohao Peng, Jiyang Liu, Sitong Wu, Jingyao Li, Junjia Huang, Xu Zhao, Yitong Wang, Ruihang Chu, Bei Yu, Jiaya Jia

TL;DR
DreamOmni3 introduces a flexible scribble-based editing and generation framework that enhances visual editing precision by combining user sketches, images, and instructions, surpassing traditional text-only models.
Contribution
It presents a novel data synthesis pipeline and a joint input scheme for complex graphical edits, along with comprehensive benchmarks for scribble-based visual editing and generation.
Findings
Outperforms existing models in scribble-based editing and generation tasks.
Effective handling of complex edits with multiple scribbles and instructions.
Public release of models and code to facilitate further research.
Abstract
Recently unified generation and editing models have achieved remarkable success with their impressive performance. These models rely mainly on text prompts for instruction-based editing and generation, but language often fails to capture users intended edit locations and fine-grained visual details. To this end, we propose two tasks: scribble-based editing and generation, that enables more flexible creation on graphical user interface (GUI) combining user textual, images, and freehand sketches. We introduce DreamOmni3, tackling two challenges: data creation and framework design. Our data synthesis pipeline includes two parts: scribble-based editing and generation. For scribble-based editing, we define four tasks: scribble and instruction-based editing, scribble and multimodal instruction-based editing, image fusion, and doodle editing. Based on DreamOmni2 dataset, we extract editable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
