DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM   Planning

Abhay Zala; Han Lin; Jaemin Cho; Mohit Bansal

arXiv:2310.12128·cs.CV·July 16, 2024·1 cites

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal

PDF

Open Access 1 Datasets

TL;DR

DiagrammerGPT is a two-stage framework that leverages large language models to generate accurate, complex diagrams with detailed object layouts and labels, surpassing existing text-to-image models.

Contribution

The paper introduces a novel two-stage diagram generation approach using LLMs for planning and a dedicated diagram generator, along with a new benchmark dataset for evaluation.

Findings

01

Outperforms existing T2I models in diagram accuracy

02

Enables open-domain and multi-platform diagram generation

03

Supports human-in-the-loop editing and multimodal planning

Abstract

Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. A diagram is a symbolic/schematic representation that explains information using structurally rich and spatially complex visualizations (e.g., a dense combination of related objects, text labels, directional arrows/lines, etc.). Existing state-of-the-art T2I models often fail at diagram generation because they lack fine-grained object layout control when many objects are densely connected via complex relations such as arrows/lines, and also often fail to render comprehensible text labels. To address this gap, we present DiagrammerGPT, a novel two-stage text-to-diagram generation framework leveraging the layout guidance capabilities of LLMs to generate more accurate diagrams. In the first stage, we use LLMs to generate and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

abhayzala/AI2D-Caption
dataset· 126 dl
126 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques