DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams
Jincheng Lou, Ruohan Xu, Jiapeng Li, Junyin Pi, Runzhe Tao, Weijian Fan, Xiao Tan, Guojie Luo, Yibo Lin

TL;DR
DiagramNet introduces a new multimodal dataset and a progressive training pipeline for recognizing and reasoning about complex system-level diagrams, significantly outperforming existing models and benchmarks.
Contribution
The paper presents the first multimodal dataset for system-level diagrams and a novel multi-stage workflow that enhances visual reasoning and generalizes across tasks.
Findings
Our 3B-parameter model outperforms GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x.
The workflow improves Task 1 performance by 128.7x for Gemini-2.5-Pro.
Effective zero-shot transfer to AMSBench with minimal detector adaptation.
Abstract
System-level diagrams encode the architectural blueprint of chip design, specifying module functions, dataflows, and interface protocols. However, non-standardized symbols and the scarcity of structured training data hinder existing multimodal large language models (MLLMs) from recognizing these diagrams. To address this gap, we introduce DiagramNet, the first multimodal dataset for system-level diagrams, comprising 10,977 connection annotations and 15,515 chain-of-thought QA pairs across four tasks: Listing, Localization, Connection, and Circuit QA. Building on this dataset, we propose a progressive training pipeline together with a decoupled multi-agent workflow that decomposes complex visual reasoning into Perception, Reasoning, and Knowledge stages. On the DiagramNet benchmark, integrating our 3B-parameter model with the proposed workflow surpasses the 2025 EDA Elite Challenge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
