One-Shot Multilingual Font Generation Via ViT

Zhiheng Wang; Jiarui Liu

arXiv:2412.11342·cs.CV·December 17, 2024

One-Shot Multilingual Font Generation Via ViT

Zhiheng Wang, Jiarui Liu

PDF

Open Access

TL;DR

This paper presents a ViT-based model for multilingual font generation that handles complex logographic and alphabetic scripts, utilizing pretraining and retrieval guidance to produce high-quality, adaptable fonts for unseen characters.

Contribution

Introduces a novel ViT-based framework with retrieval-augmented guidance for scalable, high-quality multilingual font generation, addressing limitations of prior methods.

Findings

01

Effective font generation across multiple languages.

02

High-quality results for unseen and user-crafted characters.

03

Enhanced scalability and adaptability demonstrated.

Abstract

Font design poses unique challenges for logographic languages like Chinese, Japanese, and Korean (CJK), where thousands of unique characters must be individually crafted. This paper introduces a novel Vision Transformer (ViT)-based model for multi-language font generation, effectively addressing the complexities of both logographic and alphabetic scripts. By leveraging ViT and pretraining with a strong visual pretext task (Masked Autoencoding, MAE), our model eliminates the need for complex design components in prior frameworks while achieving comprehensive results with enhanced generalizability. Remarkably, it can generate high-quality fonts across multiple languages for unseen, unknown, and even user-crafted characters. Additionally, we integrate a Retrieval-Augmented Guidance (RAG) module to dynamically retrieve and adapt style references, improving scalability and real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis

MethodsAttention Is All You Need · Linear Layer · Adam · Vision Transformer · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Byte Pair Encoding