Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation
Peiyu Wang, Yi Peng, Yimeng Gan, Liang Hu, Tianyidan Xie, Xiaokun Wang, Yichen Wei, Chuanxin Tang, Bo Zhu, Changshi Li, Hongyang Wei, Eric Li, Xuchen Song, Yang Liu, Yahui Zhou

TL;DR
Skywork UniPic is a versatile 1.5B autoregressive model that unifies image understanding, generation, and editing, achieving state-of-the-art results efficiently on standard hardware.
Contribution
It introduces a unified architecture with a decoupled encoding strategy and a progressive training schedule, enabling high-performance multimodal tasks with limited resources.
Findings
Achieves a GenEval score of 0.86, surpassing existing models
Sets a new DPG-Bench record of 85.5 for complex generation
Generates 1024x1024 images using under 15 GB of GPU memory
Abstract
We introduce Skywork UniPic, a 1.5 billion-parameter autoregressive model that unifies image understanding, text-to-image generation, and image editing within a single architecture-eliminating the need for task-specific adapters or inter-module connectors-and demonstrate that compact multimodal systems can achieve state-of-the-art performance on commodity hardware. Skywork UniPic achieves a GenEval score of 0.86, surpassing most existing unified models; sets a new DPG-Bench complex-generation record of 85.5; attains 5.83 on GEditBench-EN and 3.49 on ImgEdit-Bench for image editing; and generates 1024 x 1024 images with under 15 GB of GPU memory (e.g., RTX 4090). (1) a decoupled encoding strategy that leverages a masked autoregressive encoder for synthesis and a SigLIP2 encoder for understanding, all feeding a shared autoregressive decoder; (2) a progressive, resolution-aware training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Skywork/Skywork-UniPic-1.5Bmodel· 66 dl· ♡ 11566 dl♡ 115
- 🤗Skywork/UniPic2-Metaquery-9Bmodel· 29 dl· ♡ 2029 dl♡ 20
- 🤗Skywork/UniPic2-SD3.5M-Kontext-2Bmodel· 24 dl· ♡ 2424 dl♡ 24
- 🤗Skywork/UniPic2-Metaquery-GRPO-9Bmodel· 27 dl· ♡ 627 dl♡ 6
- 🤗Skywork/UniPic2-SD3.5M-Kontext-GRPO-2Bmodel· 14 dl· ♡ 1114 dl♡ 11
- 🤗Skywork/Unipic3model· 35 dl· ♡ 2135 dl♡ 21
- 🤗Skywork/UniPic2-Metaquery-Flashmodel· 8 dl· ♡ 68 dl♡ 6
- 🤗Skywork/UniPic2-Metaquery-GRPO-Flashmodel· 7 dl· ♡ 57 dl♡ 5
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Advanced Image and Video Retrieval Techniques
