XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation
Wei Liu, Fangyue Liu, Fei Ding, Qian He, Zili Yi

TL;DR
This paper introduces XMP-Font, a self-supervised cross-modality pre-training approach with a transformer encoder for few-shot font generation, effectively capturing complex style features without fine-tuning.
Contribution
It proposes a novel self-supervised pre-training strategy and a cross-modality transformer encoder to improve style representation in few-shot font generation.
Findings
Successfully transfers styles at all scales.
Requires only one reference glyph.
Achieves 28% fewer bad cases than state-of-the-art methods.
Abstract
Generating a new font library is a very labor-intensive and time-consuming job for glyph-rich scripts. Few-shot font generation is thus required, as it requires only a few glyph references without fine-tuning during test. Existing methods follow the style-content disentanglement paradigm and expect novel fonts to be produced by combining the style codes of the reference glyphs and the content representations of the source. However, these few-shot font generation methods either fail to capture content-independent style representations, or employ localized component-wise style representations, which is insufficient to model many Chinese font styles that involve hyper-component features such as inter-component spacing and "connected-stroke". To resolve these drawbacks and make the style representations more reliable, we propose a self-supervised cross-modality pre-training strategy and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
