VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
Guotao Liang, Zhangcheng Wang, Chuang Wang, Juncheng Hu, Haitao Zhou, Junhua Liu, Jing Zhang, Dong Xu, and Qian Yu

TL;DR
VAnim is a novel LLM-based framework for open-domain SVG animation that models sparse state updates to preserve structure and enable precise control, significantly improving semantic alignment and validity.
Contribution
It introduces a new paradigm of Sparse State Updates for SVG animation, along with a control mechanism and reinforcement learning approach, and provides the SVGAnim-134k benchmark.
Findings
VAnim compresses sequence length by over 9.8x.
Outperforms baselines in semantic alignment and structural validity.
Achieves high-fidelity visual feedback through rendering-aware reinforcement learning.
Abstract
Scalable Vector Graphics (SVG) animation generation is pivotal for professional design due to their structural editability and resolution independence. However, this task remains challenging as it requires bridging discrete code representations with continuous visual dynamics. Existing optimization-based methods often destroy topological consistency, while general-purpose LLMs rely on rigid CSS/SMIL transformations, failing to model geometry-level non-rigid deformations. To address these limitations, we present VAnim, the first LLM-based framework for open-domain text-to-SVG animation. We reconceptualize animation not as sequence generation, but as Sparse State Updates (SSU) on a persistent SVG DOM tree. This paradigm compresses sequence length by over 9.8x while preserving the SVG DOM structure and non-participating elements by construction. To enable precise control, we propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
