SketchAgent: Generating Structured Diagrams from Hand-Drawn Sketches
Cheng Tan, Qi Chen, Jingxuan Wei, Gaowei Wu, Zhangyang Gao, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li

TL;DR
SketchAgent is a multi-agent system that automates converting hand-drawn sketches into structured, semantically coherent diagrams, reducing manual effort and enabling applications in design, education, and engineering.
Contribution
We introduce SketchAgent, a novel multi-agent framework that automates sketch-to-diagram conversion, and the Sketch2Diagram Benchmark for evaluating such systems.
Findings
SketchAgent effectively produces accurate diagrams from sketches.
The Sketch2Diagram Benchmark provides a comprehensive dataset for evaluation.
Our approach significantly reduces manual effort in diagram creation.
Abstract
Hand-drawn sketches are a natural and efficient medium for capturing and conveying ideas. Despite significant advancements in controllable natural image generation, translating freehand sketches into structured, machine-readable diagrams remains a labor-intensive and predominantly manual task. The primary challenge stems from the inherent ambiguity of sketches, which lack the structural constraints and semantic precision required for automated diagram generation. To address this challenge, we introduce SketchAgent, a multi-agent system designed to automate the transformation of hand-drawn sketches into structured diagrams. SketchAgent integrates sketch recognition, symbolic reasoning, and iterative validation to produce semantically coherent and structurally accurate diagrams, significantly reducing the need for manual effort. To evaluate the effectiveness of our approach, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Interactive and Immersive Displays · Data Visualization and Analytics
