From Tokens to Numbers: Continuous Number Modeling for SVG Generation
Michael Ogezi, Martin Bell, Freda Shi, and Ethan Smith

TL;DR
This paper introduces Continuous Number Modeling (CNM), a novel approach for SVG generation that directly models numerical parameters as continuous values, improving training efficiency and visual quality over token-based methods.
Contribution
The paper proposes CNM, a continuous modeling framework for SVG parameters, and demonstrates its effectiveness through a multimodal transformer trained on 2 million samples with reinforcement learning.
Findings
Training speed increased by over 30%
Higher perceptual fidelity compared to alternatives
Effective for high-quality vector graphic generation
Abstract
For certain image generation tasks, vector graphics such as Scalable Vector Graphics (SVGs) offer clear benefits such as increased flexibility, size efficiency, and editing ease, but remain less explored than raster-based approaches. A core challenge is that the numerical, geometric parameters, which make up a large proportion of SVGs, are inefficiently encoded as long sequences of tokens. This slows training, reduces accuracy, and hurts generalization. To address these problems, we propose Continuous Number Modeling (CNM), an approach that directly models numbers as first-class, continuous values rather than discrete tokens. This formulation restores the mathematical elegance of the representation by aligning the model's inputs with the data's continuous nature, removing discretization artifacts introduced by token-based encoding. We then train a multimodal transformer on 2 million…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications
