DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal

TL;DR
DiagrammerGPT is a two-stage framework that leverages large language models to generate accurate, complex diagrams with detailed object layouts and labels, surpassing existing text-to-image models.
Contribution
The paper introduces a novel two-stage diagram generation approach using LLMs for planning and a dedicated diagram generator, along with a new benchmark dataset for evaluation.
Findings
Outperforms existing T2I models in diagram accuracy
Enables open-domain and multi-platform diagram generation
Supports human-in-the-loop editing and multimodal planning
Abstract
Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. A diagram is a symbolic/schematic representation that explains information using structurally rich and spatially complex visualizations (e.g., a dense combination of related objects, text labels, directional arrows/lines, etc.). Existing state-of-the-art T2I models often fail at diagram generation because they lack fine-grained object layout control when many objects are densely connected via complex relations such as arrows/lines, and also often fail to render comprehensible text labels. To address this gap, we present DiagrammerGPT, a novel two-stage text-to-diagram generation framework leveraging the layout guidance capabilities of LLMs to generate more accurate diagrams. In the first stage, we use LLMs to generate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
