DreamOmni: Unified Image Generation and Editing

Bin Xia; Yuechen Zhang; Jingyao Li; Chengyao Wang; Yitong Wang; Xinglong Wu; Bei Yu; and Jiaya Jia

arXiv:2412.17098·cs.CV·October 3, 2025

DreamOmni: Unified Image Generation and Editing

Bin Xia, Yuechen Zhang, Jingyao Li, Chengyao Wang, Yitong Wang, Xinglong Wu, Bei Yu, and Jiaya Jia

PDF

Open Access

TL;DR

DreamOmni is a unified model that combines image generation and editing capabilities, leveraging synthetic data and joint training to improve performance across tasks in computer vision.

Contribution

The paper introduces DreamOmni, a novel unified framework that integrates text-to-image generation and editing tasks, along with a synthetic data pipeline for efficient dataset creation.

Findings

01

Enhanced image editing performance through joint training

02

Effective synthetic data pipeline for high-quality datasets

03

Unified framework improves generation and editing quality

Abstract

Currently, the success of large language models (LLMs) illustrates that a unified multitasking approach can significantly enhance model usability, streamline deployment, and foster synergistic benefits across different tasks. However, in computer vision, while text-to-image (T2I) models have significantly improved generation quality through scaling up, their framework design did not initially consider how to unify with downstream tasks, such as various types of editing. To address this, we introduce DreamOmni, a unified model for image generation and editing. We begin by analyzing existing frameworks and the requirements of downstream tasks, proposing a unified framework that integrates both T2I models and various editing tasks. Furthermore, another key challenge is the efficient creation of high-quality editing data, particularly for instruction-based and drag-based editing. To this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques