CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement
Dikshant Kukreja, Kshitij Sah, Karan Goyal, Mukesh Mohania, Vikram Goyal

TL;DR
CAGE is a novel method combining code synthesis and diffusion models to generate accurate, visually appealing educational diagrams, addressing the accuracy-aesthetics gap in existing approaches.
Contribution
The paper introduces CAGE, a new pipeline that uses LLMs and diffusion models to produce high-quality, accurate educational diagrams, along with a new dataset EduDiagram-2K.
Findings
CAGE outperforms existing methods in label fidelity and visual quality.
Quantitative and human evaluations confirm CAGE's effectiveness.
The EduDiagram-2K dataset enables further research in diagram generation.
Abstract
Educational diagrams -- labeled illustrations of biological processes, chemical structures, physical systems, and mathematical concepts -- are essential cognitive tools in K-12 instruction. Yet no existing method can generate them both accurately and engagingly. Open-source diffusion models produce visually rich images but catastrophically garble text labels. Code-based generation via LLMs guarantees label correctness but yields visually flat outputs. Closed-source APIs partially bridge this gap but remain unreliable and prohibitively expensive at educational scale. We quantify this accuracy-aesthetics dilemma across all three paradigms on 400 K-12 diagram prompts, measuring both label fidelity and visual quality through complementary automated and human evaluation protocols. To resolve it, we propose CAGE (Code-Anchored Generative Enhancement): an LLM synthesizes executable code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
