CalliMaster: Mastering Page-level Chinese Calligraphy via Layout-guided Spatial Planning
Tianshuo Xu, Tiantian Hong, Zhifei Chen, Fei Chao, Ying-cong Chen

TL;DR
CalliMaster is a novel framework that combines spatial planning and content synthesis within a single model to generate high-quality, controllable Chinese calligraphy at the page level, inspired by human writing cognition.
Contribution
It introduces a unified, coarse-to-fine pipeline with a Multimodal Diffusion Transformer that decouples layout planning from content synthesis for improved control and quality.
Findings
Achieves state-of-the-art calligraphy generation quality.
Enables user-controlled layout adjustments and re-planning.
Extends to artifact restoration and forensic analysis.
Abstract
Page-level calligraphy synthesis requires balancing glyph precision with layout composition. Existing character models lack spatial context, while page-level methods often compromise brushwork detail. In this paper, we present \textbf{CalliMaster}, a unified framework for controllable generation and editing that resolves this conflict by decoupling spatial planning from content synthesis. Inspired by the human cognitive process of ``planning before writing'', we introduce a coarse-to-fine pipeline \textbf{(Text Layout Image)} to tackle the combinatorial complexity of page-scale synthesis. Operating within a single Multimodal Diffusion Transformer, a spatial planning stage first predicts character bounding boxes to establish the global spatial arrangement. This intermediate layout then serves as a geometric prompt for the content synthesis stage, where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis
