See it. Say it. Sorted: Agentic System for Compositional Diagram Generation

Hantao Zhang; Jingyang Liu; Ed Li

arXiv:2508.15222·cs.AI·November 18, 2025

See it. Say it. Sorted: Agentic System for Compositional Diagram Generation

Hantao Zhang, Jingyang Liu, Ed Li

PDF

Open Access

TL;DR

This paper introduces a novel, training-free agentic system that combines vision-language and large language models to convert rough sketches into precise, editable SVG diagrams, improving layout fidelity and structural accuracy.

Contribution

The proposed system uniquely integrates VLMs and LLMs in an iterative, qualitative reasoning loop for sketch-to-diagram conversion, emphasizing global constraints and human-in-the-loop capabilities.

Findings

01

Outperforms GPT-5 and Gemini-2.5-Pro in reconstructing flowchart sketches.

02

Accurately composes complex primitives like multi-headed arrows.

03

Supports human-in-the-loop corrections and is extensible to presentation tools.

Abstract

We study sketch-to-diagram generation: converting rough hand sketches into precise, compositional diagrams. Diffusion models excel at photorealism but struggle with the spatial precision, alignment, and symbolic structure required for flowcharts. We introduce See it. Say it. Sorted., a training-free agentic system that couples a Vision-Language Model (VLM) with Large Language Models (LLMs) to produce editable Scalable Vector Graphics (SVG) programs. The system runs an iterative loop in which a Critic VLM proposes a small set of qualitative, relational edits; multiple candidate LLMs synthesize SVG updates with diverse strategies (conservative->aggressive, alternative, focused); and a Judge VLM selects the best candidate, ensuring stable improvement. This design prioritizes qualitative reasoning over brittle numerical estimates, preserves global constraints (e.g., alignment,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Data Visualization and Analytics