CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement

Dikshant Kukreja; Kshitij Sah; Karan Goyal; Mukesh Mohania; Vikram Goyal

arXiv:2604.09691·cs.CV·April 14, 2026

CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement

Dikshant Kukreja, Kshitij Sah, Karan Goyal, Mukesh Mohania, Vikram Goyal

PDF

TL;DR

CAGE is a novel method combining code synthesis and diffusion models to generate accurate, visually appealing educational diagrams, addressing the accuracy-aesthetics gap in existing approaches.

Contribution

The paper introduces CAGE, a new pipeline that uses LLMs and diffusion models to produce high-quality, accurate educational diagrams, along with a new dataset EduDiagram-2K.

Findings

01

CAGE outperforms existing methods in label fidelity and visual quality.

02

Quantitative and human evaluations confirm CAGE's effectiveness.

03

The EduDiagram-2K dataset enables further research in diagram generation.

Abstract

Educational diagrams -- labeled illustrations of biological processes, chemical structures, physical systems, and mathematical concepts -- are essential cognitive tools in K-12 instruction. Yet no existing method can generate them both accurately and engagingly. Open-source diffusion models produce visually rich images but catastrophically garble text labels. Code-based generation via LLMs guarantees label correctness but yields visually flat outputs. Closed-source APIs partially bridge this gap but remain unreliable and prohibitively expensive at educational scale. We quantify this accuracy-aesthetics dilemma across all three paradigms on 400 K-12 diagram prompts, measuring both label fidelity and visual quality through complementary automated and human evaluation protocols. To resolve it, we propose CAGE (Code-Anchored Generative Enhancement): an LLM synthesizes executable code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.