CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
Lichen Ma, Tiezhu Yue, Pei Fu, Yujie Zhong, Kai Zhou, Xiaoming Wei,, Jie Hu

TL;DR
CharGen is a novel character-level visual text generation model that uses a multimodal encoder and perceptual loss to significantly improve rendering accuracy, especially for complex and Chinese text, outperforming recent methods.
Contribution
It introduces a character-level multimodal encoder and perceptual loss, enhancing accuracy and addressing stroke inaccuracies in visual text generation.
Findings
Outperforms recent methods by over 8% on benchmark accuracy.
Achieves 5.5% higher accuracy on Chinese text datasets.
Effectively captures fine-grained cross-modality features.
Abstract
Recently, significant advancements have been made in diffusion-based visual text generation models. Although the effectiveness of these methods in visual text rendering is rapidly improving, they still encounter challenges such as inaccurate characters and strokes when rendering complex visual text. In this paper, we propose CharGen, a highly accurate character-level visual text generation and editing model. Specifically, CharGen employs a character-level multimodal encoder that not only extracts character-level text embeddings but also encodes glyph images character by character. This enables it to capture fine-grained cross-modality features more effectively. Additionally, we introduce a new perceptual loss in CharGen to enhance character shape supervision and address the issue of inaccurate strokes in generated text. It is worth mentioning that CharGen can be integrated into existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Computational and Text Analysis Methods
MethodsDiffusion
