Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Chenxi Zhao; Chen Zhu; Xiaokun Feng; Aiming Hao; Jiashu Zhu; Jiachen Lei; Jiahong Wu; Xiangxiang Chu; Jufeng Yang

arXiv:2604.18168·cs.CV·April 21, 2026

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Chenxi Zhao, Chen Zhu, Xiaokun Feng, Aiming Hao, Jiashu Zhu, Jiachen Lei, Jiahong Wu, Xiangxiang Chu, Jufeng Yang

PDF

1 Repo

TL;DR

This paper extends one-step image generation from class labels to text inputs by integrating LLM-based text encoders into the MeanFlow framework, enabling richer content creation with improved performance.

Contribution

It introduces a novel approach to incorporate powerful text encoders into MeanFlow for effective text-conditioned image synthesis, addressing challenges of discriminability in limited-step generation.

Findings

01

Successful adaptation of MeanFlow for text-conditioned image generation.

02

Significant improvements in generation quality demonstrated on diffusion models.

03

Analysis reveals the importance of high discriminability in text features for limited-step generation.

Abstract

Few-step generation has been a long-standing goal, with recent one-step generation methods exemplified by MeanFlow achieving remarkable results. Existing research on MeanFlow primarily focuses on class-to-image generation. However, an intuitive yet unexplored direction is to extend the condition from fixed class labels to flexible text inputs, enabling richer content creation. Compared to the limited class labels, text conditions pose greater challenges to the model's understanding capability, necessitating the effective integration of powerful text encoders into the MeanFlow framework. Surprisingly, although incorporating text conditions appears straightforward, we find that integrating powerful LLM-based text encoders using conventional training strategies results in unsatisfactory performance. To uncover the underlying cause, we conduct detailed analyses and reveal that, due to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AMAP-ML/EMF
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.