UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu,, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin,, Hengshuang Zhao

TL;DR
UniReal is a unified framework that models image generation and editing tasks as discontinuous video generation, leveraging large-scale videos to learn real-world dynamics and enable diverse visual tasks.
Contribution
It introduces a novel approach that treats image tasks as video generation, unifying various tasks under a single framework using video-based supervision.
Findings
Handles shadows, reflections, pose variation, and object interaction effectively.
Supports diverse tasks like generation, editing, and composition seamlessly.
Demonstrates emergent capabilities for new applications.
Abstract
We introduce UniReal, a unified framework designed to address various image generation and editing tasks. Existing solutions often vary by tasks, yet share fundamental principles: preserving consistency between inputs and outputs while capturing visual variations. Inspired by recent video generation models that effectively balance consistency and variation across frames, we propose a unifying approach that treats image-level tasks as discontinuous video generation. Specifically, we treat varying numbers of input and output images as frames, enabling seamless support for tasks such as image generation, editing, customization, composition, etc. Although designed for image-level tasks, we leverage videos as a scalable source for universal supervision. UniReal learns world dynamics from large-scale videos, demonstrating advanced capability in handling shadows, reflections, pose variation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques · Human Motion and Animation
