VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Qijia He; Xunmei Liu; Hammaad Memon; Ziang Li; Zixian Ma; Jaemin Cho; Jason Ren; Daniel S Weld; Ranjay Krishna

arXiv:2603.24575·cs.CV·March 26, 2026

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna

PDF

Open Access 1 Models

TL;DR

VFIG is a novel vision-language model that converts complex raster images of technical figures into editable SVG vector graphics, addressing the challenge of reconstructing original diagrams from flattened images.

Contribution

The paper introduces VFIG, a large-scale dataset VFIG-DATA, a hierarchical training curriculum, and a comprehensive evaluation suite VFIG-BENCH for high-fidelity figure-to-SVG conversion.

Findings

01

VFIG outperforms existing open-source models in figure-to-SVG conversion.

02

VFIG achieves a VLM-Judge score of 0.829, comparable to GPT-5.2.

03

The hierarchical training improves structural fidelity and layout accuracy.

Abstract

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent. To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. While this task is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. We address this by introducing VFIG-DATA, a large-scale dataset of 66K high-quality figure-SVG pairs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
XunmeiLiu/VFIG-4B
model· 263 dl· ♡ 3
263 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Handwritten Text Recognition Techniques