Benchmarking and Evaluating VLMs for Software Architecture Diagram Understanding

Shuyin Ouyang; Jie M. Zhang; Jingzhi Gong; Gunel Jahangirova; Mohammad Reza Mousavi; Jack Johns; Beum Seuk Lee; Adam Ziolkowski; Botond Virginas; Joost Noppen

arXiv:2604.04009·cs.SE·April 7, 2026

Benchmarking and Evaluating VLMs for Software Architecture Diagram Understanding

Shuyin Ouyang, Jie M. Zhang, Jingzhi Gong, Gunel Jahangirova, Mohammad Reza Mousavi, Jack Johns, Beum Seuk Lee, Adam Ziolkowski, Botond Virginas, Joost Noppen

PDF

TL;DR

This paper introduces SADU, a benchmark for evaluating vision-language models on understanding software architecture diagrams, revealing current models' limitations in diagram reasoning and grounding.

Contribution

The paper presents SADU, a new benchmark with curated diagrams and question-answer tasks, to assess and improve VLMs' understanding of software architecture diagrams.

Findings

01

Best model achieves only 70.18% accuracy on SADU tasks.

02

Current VLMs struggle with diagram reasoning and visual relation grounding.

03

Significant gap exists between current models and software engineering needs.

Abstract

Software architecture diagrams are important design artifacts for communicating system structure, behavior, and data organization throughout the software development lifecycle. Although recent progress in large language models has substantially advanced code-centric software engineering tasks such as code generation, testing, and maintenance, the ability of modern vision-language models (VLMs) to understand software architecture diagrams remains underexplored. To address this gap, we present SADU, a benchmark for Software Architecture Diagram Understanding that evaluates VLMs on architecture diagrams as structured software engineering artifacts rather than generic images. SADU contains 154 carefully curated diagrams spanning behavioral, structural, and ER diagrams, paired with structured annotations and 2,431 question-answer tasks covering counting and retrieval reasoning. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.