StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan A. Rodriguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli

TL;DR
StarVector is a multimodal large language model that generates compact, semantically rich SVG graphics from images and text by understanding image semantics and utilizing diverse SVG primitives.
Contribution
It introduces StarVector, a novel SVG generation model that directly operates in SVG code space and leverages a new dataset and benchmark for improved vector graphic synthesis.
Findings
StarVector outperforms previous methods in generating compact SVGs.
The SVG-Stack dataset enables better generalization across vectorization tasks.
SVG-Bench provides a comprehensive evaluation framework for SVG generation.
Abstract
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation methods have focused on curve-based vectorization, lacking semantic understanding, often producing artifacts, and struggling with SVG primitives beyond path curves. To address these issues, we introduce StarVector, a multimodal large language model for SVG generation. It performs image vectorization by understanding image semantics and using SVG primitives for compact, precise outputs. Unlike traditional methods, StarVector works directly in the SVG code space, leveraging visual understanding to apply accurate SVG primitives. To train StarVector, we create SVG-Stack, a diverse dataset of 2M samples that enables generalization across vectorization tasks and precise use of primitives like ellipses, polygons, and text. We address challenges in SVG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsContrastive Language-Image Pre-training · ALIGN · Adapter
