VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S Weld, Ranjay Krishna

TL;DR
VFIG is a novel vision-language model that converts complex raster images of technical figures into editable SVG vector graphics, addressing the challenge of reconstructing original diagrams from flattened images.
Contribution
The paper introduces VFIG, a large-scale dataset VFIG-DATA, a hierarchical training curriculum, and a comprehensive evaluation suite VFIG-BENCH for high-fidelity figure-to-SVG conversion.
Findings
VFIG outperforms existing open-source models in figure-to-SVG conversion.
VFIG achieves a VLM-Judge score of 0.829, comparable to GPT-5.2.
The hierarchical training improves structural fidelity and layout accuracy.
Abstract
Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent. To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. While this task is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. We address this by introducing VFIG-DATA, a large-scale dataset of 66K high-quality figure-SVG pairs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Data Visualization and Analytics · Handwritten Text Recognition Techniques
