ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Junying Chen; Zhenyang Cai; Pengcheng Chen; Shunian Chen; Ke Ji; Xidong Wang; Yunjin Yang; Benyou Wang

arXiv:2506.18095·cs.CV·June 24, 2025

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Junying Chen, Zhenyang Cai, Pengcheng Chen, Shunian Chen, Ke Ji, Xidong Wang, Yunjin Yang, Benyou Wang

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces ShareGPT-4o-Image, a large synthetic dataset, and Janus-4o, a multimodal model capable of high-quality text-to-image and text-and-image-to-image generation, advancing open research in photorealistic image synthesis.

Contribution

The paper presents the first synthetic dataset for multimodal image generation and a new model, Janus-4o, that improves image generation quality and supports new tasks using limited training data.

Findings

01

Janus-4o outperforms previous models in text-to-image generation.

02

Janus-4o successfully performs text-and-image-to-image generation from scratch.

03

The approach achieves high-quality results with only 91K synthetic samples and 6 hours of training.

Abstract

Recent advances in multimodal generative models have unlocked photorealistic, instruction-aligned image generation, yet leading systems like GPT-4o-Image remain proprietary and inaccessible. To democratize these capabilities, we present ShareGPT-4o-Image, the first dataset comprising 45K text-to-image and 46K text-and-image-to-image data, all synthesized using GPT-4o's image generation capabilities for distilling its advanced image generation abilities. Leveraging this dataset, we develop Janus-4o, a multimodal large language model capable of both text-to-image and text-and-image-to-image generation. Janus-4o not only significantly improves text-to-image generation over its predecessor, Janus-Pro, but also newly supports text-and-image-to-image generation. Notably, it achieves impressive performance in text-and-image-to-image generation from scratch, using only 91K synthetic samples and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

freedomintelligence/sharegpt-4o-image
pytorchOfficial

Models

🤗
FreedomIntelligence/Janus-4o-7B
model· 30 dl· ♡ 49
30 dl♡ 49

Datasets

FreedomIntelligence/ShareGPT-4o-Image
dataset· 1.6k dl
1.6k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.