Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Harold Haodong Chen; Xinxiang Yin; Wen-Jie Shu; Hongfei Zhang; Zixin Zhang; Chenfei Liao; Litao Guo; Qifeng Chen; Ying-Cong Chen

arXiv:2602.02227·cs.CV·February 3, 2026

Show, Don't Tell: Morphing Latent Reasoning into Image Generation

Harold Haodong Chen, Xinxiang Yin, Wen-Jie Shu, Hongfei Zhang, Zixin Zhang, Chenfei Liao, Litao Guo, Qifeng Chen, Ying-Cong Chen

PDF

Open Access 1 Models

TL;DR

LatentMorph introduces a novel latent reasoning framework for text-to-image generation, improving quality, efficiency, and cognitive alignment by performing reasoning in continuous latent spaces instead of explicit steps.

Contribution

It presents a new latent reasoning approach with four lightweight components that enable adaptive, efficient, and implicit reasoning during image generation.

Findings

01

Improves generation quality by 16-25% on key benchmarks.

02

Outperforms explicit reasoning methods by 11-15% on reasoning tasks.

03

Reduces inference time by 44% and token use by 51%.

Abstract

Text-to-image (T2I) generation has achieved remarkable progress, yet existing methods often lack the ability to dynamically reason and refine during generation--a hallmark of human creativity. Current reasoning-augmented paradigms most rely on explicit thought processes, where intermediate reasoning is decoded into discrete text at fixed steps with frequent image decoding and re-encoding, leading to inefficiencies, information loss, and cognitive mismatches. To bridge this gap, we introduce LatentMorph, a novel framework that seamlessly integrates implicit latent reasoning into the T2I generation process. At its core, LatentMorph introduces four lightweight components: (i) a condenser for summarizing intermediate generation states into compact visual memory, (ii) a translator for converting latent thoughts into actionable guidance, (iii) a shaper for dynamically steering next image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
CheeseStar/LatentMorph
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Aesthetic Perception and Analysis