LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

Junhao Chen; Kejun Gao; Yuehan Cui; Mingze Sun; Mingjin Chen; Shaohui Wang; Xiaoxiao Long; Fei Ma; Qi Tian; Ruqi Huang; Hao Zhao

arXiv:2604.11792·cs.CV·April 14, 2026

LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen, Shaohui Wang, Xiaoxiao Long, Fei Ma, Qi Tian, Ruqi Huang, Hao Zhao

PDF

TL;DR

LottieGPT introduces a novel framework for tokenizing and autoregressively generating vector animations using a tailored Lottie tokenizer and a large dataset, enabling natural language-driven creation of editable, resolution-independent animations.

Contribution

It is the first to enable native vector animation generation by tokenizing Lottie animations and fine-tuning a multimodal model, significantly advancing multimedia synthesis capabilities.

Findings

01

Tokenizer reduces sequence length while maintaining fidelity.

02

LottieGPT generalizes across diverse animation styles.

03

Outperforms previous SVG generation models.

Abstract

Despite rapid progress in video generation, existing models are incapable of producing vector animation, a dominant and highly expressive form of multimedia on the Internet. Vector animations offer resolution-independence, compactness, semantic structure, and editable parametric motion representations, yet current generative models operate exclusively in raster space and thus cannot synthesize them. Meanwhile, recent advances in large multimodal models demonstrate strong capabilities in generating structured data such as slides, 3D meshes, LEGO sequences, and indoor layouts, suggesting that native vector animation generation may be achievable. In this work, we present the first framework for tokenizing and autoregressively generating vector animations. We adopt Lottie, a widely deployed JSON-based animation standard, and design a tailored Lottie Tokenizer that encodes layered geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.