DiagramIR: An Automatic Pipeline for Educational Math Diagram Evaluation
Vishal Kumar, Shubhra Mishra, Rebecca Hao, Rizwaan Malik, David Broman, Dorottya Demszky

TL;DR
DiagramIR introduces an automated, scalable evaluation pipeline for educational math diagrams using LaTeX TikZ IRs, enabling smaller models to assess diagrams effectively and reducing inference costs.
Contribution
The paper presents a novel evaluation pipeline for math diagrams that improves agreement with human raters and enhances scalability for smaller models.
Findings
Higher agreement with human raters compared to baselines
Smaller models like GPT-4.1-Mini perform comparably to larger models
Reduces inference cost by a factor of 10
Abstract
Large Language Models (LLMs) are increasingly being adopted as tools for learning; however, most tools remain text-only, limiting their usefulness for domains where visualizations are essential, such as mathematics. Recent work shows that LLMs are capable of generating code that compiles to educational figures, but a major bottleneck remains: scalable evaluation of these diagrams. We address this by proposing DiagramIR: an automatic and scalable evaluation pipeline for geometric figures. Our method relies on intermediate representations (IRs) of LaTeX TikZ code. We compare our pipeline to other evaluation baselines such as LLM-as-a-Judge, showing that our approach has higher agreement with human raters. This evaluation approach also enables smaller models like GPT-4.1-Mini to perform comparably to larger models such as GPT-5 at a 10x lower inference cost, which is important for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Data Visualization and Analytics · Handwritten Text Recognition Techniques
