LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction

Maoquan Zhang; Bisser Raytchev; Xiujuan Sun

arXiv:2508.11153·cs.CV·August 18, 2025

LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction

Maoquan Zhang, Bisser Raytchev, Xiujuan Sun

PDF

TL;DR

LEARN is a novel layout-aware diffusion framework that generates pedagogically aligned STEM illustrations, supporting reasoning and reducing cognitive load through story-driven, structured visual sequences.

Contribution

It introduces the first unified approach combining layout-based storytelling, semantic learning, and cognitive scaffolding for educational image generation.

Findings

01

Produces coherent visual sequences aligned with STEM concepts

02

Supports mid-to-high-level reasoning per Bloom's taxonomy

03

Reduces extraneous cognitive load in educational visuals

Abstract

LEARN is a layout-aware diffusion framework designed to generate pedagogically aligned illustrations for STEM education. It leverages a curated BookCover dataset that provides narrative layouts and structured visual cues, enabling the model to depict abstract and sequential scientific concepts with strong semantic alignment. Through layout-conditioned generation, contrastive visual-semantic training, and prompt modulation, LEARN produces coherent visual sequences that support mid-to-high-level reasoning in line with Bloom's taxonomy while reducing extraneous cognitive load as emphasized by Cognitive Load Theory. By fostering spatially organized and story-driven narratives, the framework counters fragmented attention often induced by short-form media and promotes sustained conceptual focus. Beyond static diagrams, LEARN demonstrates potential for integration with multimodal systems and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.