LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction
Maoquan Zhang, Bisser Raytchev, Xiujuan Sun

TL;DR
LEARN is a novel layout-aware diffusion framework that generates pedagogically aligned STEM illustrations, supporting reasoning and reducing cognitive load through story-driven, structured visual sequences.
Contribution
It introduces the first unified approach combining layout-based storytelling, semantic learning, and cognitive scaffolding for educational image generation.
Findings
Produces coherent visual sequences aligned with STEM concepts
Supports mid-to-high-level reasoning per Bloom's taxonomy
Reduces extraneous cognitive load in educational visuals
Abstract
LEARN is a layout-aware diffusion framework designed to generate pedagogically aligned illustrations for STEM education. It leverages a curated BookCover dataset that provides narrative layouts and structured visual cues, enabling the model to depict abstract and sequential scientific concepts with strong semantic alignment. Through layout-conditioned generation, contrastive visual-semantic training, and prompt modulation, LEARN produces coherent visual sequences that support mid-to-high-level reasoning in line with Bloom's taxonomy while reducing extraneous cognitive load as emphasized by Cognitive Load Theory. By fostering spatially organized and story-driven narratives, the framework counters fragmented attention often induced by short-form media and promotes sustained conceptual focus. Beyond static diagrams, LEARN demonstrates potential for integration with multimodal systems and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
