ThinkGen: Generalized Thinking for Visual Generation

Siyu Jiao; Yiheng Lin; Yujie Zhong; Qi She; Wei Zhou; Xiaohan Lan; Zilong Huang; Fei Yu; Yingchen Yu; Yunqing Zhao; Yao Zhao; Yunchao Wei

arXiv:2512.23568·cs.CV·December 30, 2025

ThinkGen: Generalized Thinking for Visual Generation

Siyu Jiao, Yiheng Lin, Yujie Zhong, Qi She, Wei Zhou, Xiaohan Lan, Zilong Huang, Fei Yu, Yingchen Yu, Yunqing Zhao, Yao Zhao, Yunchao Wei

PDF

Open Access 1 Models

TL;DR

ThinkGen introduces a novel framework that combines Chain-of-Thought reasoning with multimodal large language models and diffusion transformers to enhance generalization and quality in visual generation tasks.

Contribution

It is the first to explicitly leverage CoT reasoning in visual generation, employing a decoupled architecture and a novel training paradigm for diverse scenarios.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Demonstrates effective reasoning in diverse generative tasks.

03

Flexible joint training across datasets enhances performance.

Abstract

Recent progress in Multimodal Large Language Models (MLLMs) demonstrates that Chain-of-Thought (CoT) reasoning enables systematic solutions to complex understanding tasks. However, its extension to generation tasks remains nascent and limited by scenario-specific mechanisms that hinder generalization and adaptation. In this work, we present ThinkGen, the first think-driven visual generation framework that explicitly leverages MLLM's CoT reasoning in various generation scenarios. ThinkGen employs a decoupled architecture comprising a pretrained MLLM and a Diffusion Transformer (DiT), wherein the MLLM generates tailored instructions based on user intent, and DiT produces high-quality images guided by these instructions. We further propose a separable GRPO-based training paradigm (SepGRPO), alternating reinforcement learning between the MLLM and DiT modules. This flexible design enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
JSYuuu/ThinkGen
model· 17 dl· ♡ 1
17 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning