SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control
Ji Gan, Lingxu Chen, Jiaxu Leng, Xinbo Gao

TL;DR
SplatFont3D introduces a structure-aware framework for 3D artistic font generation from text prompts, enabling precise part-level style control and improved visual quality using 3D Gaussian splatting and diffusion models.
Contribution
It presents a novel 3D font generation method that incorporates part-level style control and efficient rendering, addressing limitations of previous 2D-focused approaches.
Findings
Outperforms existing 3D models in style-text consistency
Achieves higher visual quality in generated fonts
Provides faster rendering efficiency
Abstract
Artistic font generation (AFG) can assist human designers in creating innovative artistic fonts. However, most previous studies primarily focus on 2D artistic fonts in flat design, leaving personalized 3D-AFG largely underexplored. 3D-AFG not only enables applications in immersive 3D environments such as video games and animations, but also may enhance 2D-AFG by rendering 2D fonts of novel views. Moreover, unlike general 3D objects, 3D fonts exhibit precise semantics with strong structural constraints and also demand fine-grained part-level style control. To address these challenges, we propose SplatFont3D, a novel structure-aware text-to-3D AFG framework with 3D Gaussian splatting, which enables the creation of 3D artistic fonts from diverse style text prompts with precise part-level style control. Specifically, we first introduce a Glyph2Cloud module, which progressively enhances both…
Peer Reviews
Decision·Submitted to ICLR 2026
1. **Reasonable Solutions towards the 3D fonts challenge**: They use 2D diffusion priors to initialize 3D point clouds, balancing shape preservation and stylistic fidelity through denoising interventions and segmentation. The dynamic component assignment is a smart solution to Gaussian drift, enabling explicit part decomposition superior to implicit representations like NeRF. 2. **Extensive evaluation results**: The evaluation is comprehensive, covering global and part-level scenarios across div
1. **Limited Scope of Evaluation Data**: The dataset comprises only 44 characters with 2 styles and modes each (1760 pairs total), which may not fully represent the diversity of fonts or languages. While including Chinese characters adds some breadth, the focus on limited categories (e.g., fruits, foods) could bias results toward simpler styles, potentially limiting generalizability to more complex or abstract prompts. 2. **Lack of ablation studies about each component**. The paper lacks several
By designing Glyph2Cloud and DCA specifically for font—a highly structured object—the paper effectively leverages the explicit representation advantages of 3D Gaussian Splatting (3DGS), such as high rendering efficiency and structural decomposability, making them well-suited for this task. The work addresses a clear yet underexplored need in structured 3D artistic font generation, filling an important research gap in applying 3DGS to font modeling and generation. The experiments are comprehensiv
Data and Generalization: The evaluation dataset remains limited in scale and linguistic diversity. It is recommended to extend the experiments to fonts with higher stroke density, more complex structures, and additional languages. Moreover, it would be beneficial to report the control effectiveness and computational overhead under finer-grained component segmentation settings (e.g., more than three components). Relation to Feed-Forward 3D Generation: Recent feed-forward 3D generation methods (e
- The paper clearly articulates the need for and importance of 3D artistic fonts with fine-grained style control, as well as the limitations of previous NeRF/3DGS-based models in handling font semantic constraints. - It proposes Glyph2Cloud, a method for initializing global 3D representations, and a dynamic component assignment strategy for local component editing, achieving geometric information grouping and iterative optimization for disentanglement. - Experiments demonstrate that SplatFont3
- The introduction is overly verbose, with Lines 54-78 and Lines 79-96 containing repetitive content. Figure 1 lacks formulaic symbol annotations, making it difficult to understand the formulaic symbols in the method section. - The multi-view consistency is insufficient in Table 4.2. For example, in Figure 6, the side view of the leaves still appears wide. Regarding efficiency, while using 3DGS for geometric representation is an obvious way to improve efficiency, the three-stage training strate
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis
