CalliffusionV2: Personalized Natural Calligraphy Generation with   Flexible Multi-modal Control

Qisheng Liao; Liang Li; Yulang Fei; Gus Xia

arXiv:2410.03787·cs.CL·October 8, 2024

CalliffusionV2: Personalized Natural Calligraphy Generation with Flexible Multi-modal Control

Qisheng Liao, Liang Li, Yulang Fei, Gus Xia

PDF

Open Access

TL;DR

CalliffusionV2 is a versatile system for generating natural Chinese calligraphy with multi-modal control, enabling style customization, quick style learning, and support for non-Chinese characters, validated by both neural and human assessments.

Contribution

It introduces a multi-modal controlled calligraphy generation system that allows fine-grained style customization and rapid style adaptation with minimal data.

Findings

01

Produces stylistically accurate calligraphy recognized by classifiers and humans

02

Supports quick learning of new styles with few-shot training

03

Generates non-Chinese characters without prior training

Abstract

In this paper, we introduce CalliffusionV2, a novel system designed to produce natural Chinese calligraphy with flexible multi-modal control. Unlike previous approaches that rely solely on image or text inputs and lack fine-grained control, our system leverages both images to guide generations at fine-grained levels and natural language texts to describe the features of generations. CalliffusionV2 excels at creating a broad range of characters and can quickly learn new styles through a few-shot learning approach. It is also capable of generating non-Chinese characters without prior training. Comprehensive tests confirm that our system produces calligraphy that is both stylistically accurate and recognizable by neural network classifiers and human evaluators.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques