LongCat-Image Technical Report

Meituan LongCat Team: Hanghang Ma; Haoxian Tan; Jiale Huang; Junqiang Wu; Jun-Yan He; Lishuai Gao; Songlin Xiao; Xiaoming Wei; Xiaoqi Ma; Xunliang Cai; Yayong Guan; Jie Hu

arXiv:2512.07584·cs.CV·December 9, 2025

LongCat-Image Technical Report

Meituan LongCat Team: Hanghang Ma, Haoxian Tan, Jiale Huang, Junqiang Wu, Jun-Yan He, Lishuai Gao, Songlin Xiao, Xiaoming Wei, Xiaoqi Ma, Xunliang Cai, Yayong Guan, Jie Hu

PDF

Open Access 9 Models

TL;DR

LongCat-Image is a compact, open-source bilingual foundation model for image generation and editing, achieving state-of-the-art multilingual rendering, photorealism, and efficiency, with extensive community resources.

Contribution

It introduces a new open-source bilingual model with rigorous data curation, superior Chinese character rendering, and a compact design for efficient deployment.

Findings

01

Achieves SOTA text-rendering and photorealism.

02

Supports complex Chinese characters better than existing solutions.

03

Maintains high performance with only 6B parameters.

Abstract

We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models. 1) We achieve this through rigorous data curation strategies across the pre-training, mid-training, and SFT stages, complemented by the coordinated use of curated reward models during the RL phase. This strategy establishes the model as a new state-of-the-art (SOTA), delivering superior text-rendering capabilities and remarkable photorealism, and significantly enhancing aesthetic quality. 2) Notably, it sets a new industry standard for Chinese character rendering. By supporting even complex and rare characters, it outperforms both major open-source and commercial solutions in coverage, while also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications