LongCat-Image Technical Report
Meituan LongCat Team: Hanghang Ma, Haoxian Tan, Jiale Huang, Junqiang Wu, Jun-Yan He, Lishuai Gao, Songlin Xiao, Xiaoming Wei, Xiaoqi Ma, Xunliang Cai, Yayong Guan, Jie Hu

TL;DR
LongCat-Image is a compact, open-source bilingual foundation model for image generation and editing, achieving state-of-the-art multilingual rendering, photorealism, and efficiency, with extensive community resources.
Contribution
It introduces a new open-source bilingual model with rigorous data curation, superior Chinese character rendering, and a compact design for efficient deployment.
Findings
Achieves SOTA text-rendering and photorealism.
Supports complex Chinese characters better than existing solutions.
Maintains high performance with only 6B parameters.
Abstract
We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models. 1) We achieve this through rigorous data curation strategies across the pre-training, mid-training, and SFT stages, complemented by the coordinated use of curated reward models during the RL phase. This strategy establishes the model as a new state-of-the-art (SOTA), delivering superior text-rendering capabilities and remarkable photorealism, and significantly enhancing aesthetic quality. 2) Notably, it sets a new industry standard for Chinese character rendering. By supporting even complex and rare characters, it outperforms both major open-source and commercial solutions in coverage, while also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗meituan-longcat/LongCat-Image-Editmodel· 25k dl· ♡ 16725k dl♡ 167
- 🤗meituan-longcat/LongCat-Imagemodel· 19k dl· ♡ 23919k dl♡ 239
- 🤗meituan-longcat/LongCat-Image-Edit-Turbomodel· 35k dl· ♡ 5535k dl♡ 55
- 🤗meituan-longcat/LongCat-Image-Devmodel· 673 dl· ♡ 46673 dl♡ 46
- 🤗Tom0by/LongCat-Imagemodel
- 🤗er6y/LongCat-Image-Edit-Turbo-MNN-int8model· 4 dl4 dl
- 🤗soralip543/LongCat-Image-Devmodel
- 🤗models123/LongCat-Image-Devmodel· 7 dl7 dl
- 🤗models123/LongCat-Image-Editmodel· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications
