Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
Guotao Liang, Zhangcheng Wang, Juncheng Hu, Haitao Zhou, Ziteng Xue, Jing Zhang, Dong Xu, Qian Yu

TL;DR
This paper introduces Render-in-the-Loop, a visual feedback-based SVG generation method that improves reasoning about partial states and occlusions, outperforming open-loop models.
Contribution
It proposes a step-wise, visual-context-aware SVG synthesis paradigm with visual self-feedback training and render-and-verify inference, enhancing model reasoning and accuracy.
Findings
Outperforms open-weight baselines on MMSVGBench.
Demonstrates improved data efficiency and generalization in SVG tasks.
Effectively leverages intermediate visual states for better generation quality.
Abstract
Multimodal Large Language Models (MLLMs) have shown promising capabilities in generating Scalable Vector Graphics (SVG) via direct code synthesis. However, existing paradigms typically adopt an open-loop "blind drawing" approach, where models generate symbolic code sequences without perceiving intermediate visual outcomes. This methodology severely underutilizes the powerful visual priors embedded in MLLMs vision encoders, treating SVG generation as a disjointed textual sequence modeling task rather than an integrated visuo-spatial one. Consequently, models struggle to reason about partial canvas states and implicit occlusion relationships, which are visually explicit but textually ambiguous. To bridge this gap, we propose Render-in-the-Loop, a novel generation paradigm that reformulates SVG synthesis as a step-wise, visual-context-aware process. By rendering intermediate code states…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
