VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing

Xiaoyan Su; Peijie Dong; Zhenheng Tang; Song Tang; Yuyao Zhai; Kaitao Lin; Liang Chen; Gai Yuhang; Yuyu Luo; Qiang Wang; Xiaowen Chu

arXiv:2605.15677·cs.CL·May 18, 2026

VCG-Bench: Towards A Unified Visual-Centric Benchmark for Structured Generation and Editing

Xiaoyan Su, Peijie Dong, Zhenheng Tang, Song Tang, Yuyao Zhai, Kaitao Lin, Liang Chen, Gai Yuhang, Yuyu Luo, Qiang Wang, Xiaowen Chu

PDF

TL;DR

VCG-Bench introduces a unified, diagram-as-code benchmark for structured diagram generation and editing, addressing limitations of pixel-based methods in vision-language models.

Contribution

It proposes a new symbolic logic paradigm using mxGraph XML, along with a comprehensive dataset, evaluation protocol, and analysis of current model limitations.

Findings

01

Current SOTA VLMs struggle with structured fidelity.

02

The benchmark reveals challenges in instruction compliance.

03

The diagram-as-code approach improves editability and fidelity.

Abstract

Despite the rapid advancements in Vision-Language Models (VLMs), a critical gap remains in their ability to handle structured, controllable diagrammatic tasks essential for professional workflows. Existing methods predominantly rely on pixel-based synthesis, which operates in probabilistic pixel spaces and is inherently limited in editability and fidelity. Instead, we propose a new Diagram-as-Code paradigm with symbolic logic that leverages mxGraph Extensible Markup Language (XML) for precise diagram generation and editing. We present VCG-Bench, a unified benchmark for visual-centric \texttt{mxGraph} tasks. VCG-Bench comprises: (1) a taxonomized dataset of 1,449 diverse diagrams spanning 6 domains and 15 sub-domains, (2) a paradigm definition that integrates Generation (Vision-to-Code) and Editability (Code-to-Code), (3) a Tailored Evaluation Protocol employing multi-dimensional metrics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.