Vector Grimoire: Codebook-based Shape Generation under Raster Image Supervision
Moritz Feuerpfeil, Marco Cipriano, Gerard de Melo

TL;DR
GRIMOIRE is a novel text-guided SVG generative model that learns to produce vector graphics from raster images using only image supervision, enabling more flexible and scalable vector shape generation.
Contribution
It introduces a raster image supervised approach for SVG generation, combining a visual shape quantizer and an autoregressive transformer for natural language guided vector creation.
Findings
Outperforms previous image-supervised methods in quality
Works effectively on MNIST, icon, and font datasets
Enables scalable vector graphic generation from raster images
Abstract
Scalable Vector Graphics (SVG) is a popular format on the web and in the design industry. However, despite the great strides made in generative modeling, SVG has remained underexplored due to the discrete and complex nature of such data. We introduce GRIMOIRE, a text-guided SVG generative model that is comprised of two modules: A Visual Shape Quantizer (VSQ) learns to map raster images onto a discrete codebook by reconstructing them as vector shapes, and an Auto-Regressive Transformer (ART) models the joint probability distribution over shape tokens, positions and textual descriptions, allowing us to generate vector graphics from natural language. Unlike existing models that require direct supervision from SVG data, GRIMOIRE learns shape image patches using only raster image supervision which opens up vector generative modeling to significantly more data. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Simulation and Modeling Applications
MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings
