Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure
Jooyeol Yun, Jaegul Choo

TL;DR
This paper presents Vector Prism, a framework that recovers semantic structure in SVGs to enable more coherent and reliable animations by vision-language models, addressing current fragmentation issues.
Contribution
It introduces a novel semantic aggregation method that reorganizes SVGs into meaningful groups, significantly improving animation coherence in VLMs.
Findings
Substantial improvements over existing methods in SVG animation quality
Semantic recovery enhances interpretability of VLM interactions with vector graphics
Stable inference of semantics from noisy predictions is achieved
Abstract
Scalable Vector Graphics (SVG) are central to modern web design, and the demand to animate them continues to grow as web environments become increasingly dynamic. Yet automating the animation of vector graphics remains challenging for vision-language models (VLMs) despite recent progress in code generation and motion planning. VLMs routinely mis-handle SVGs, since visually coherent parts are often fragmented into low-level shapes that offer little guidance of which elements should move together. In this paper, we introduce a framework that recovers the semantic structure required for reliable SVG animation and reveals the missing layer that current VLM systems overlook. This is achieved through a statistical aggregation of multiple weak part predictions, allowing the system to stably infer semantics from noisy predictions. By reorganizing SVGs into semantic groups, our approach enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Data Visualization and Analytics
