Bridging the Early Science Gap with Artificial Intelligence: Evaluating Large Language Models as Tools for Early Childhood Science Education
Annika Bush, Amin Alibakhshi

TL;DR
This study assesses four large language models' ability to generate age-appropriate science explanations for preschoolers, revealing strengths, limitations, and practical insights for early childhood education.
Contribution
It systematically evaluates LLMs' effectiveness in creating developmentally suitable science content for young children, highlighting performance differences and guiding future AI educational tools.
Findings
Claude outperformed other models in biological explanations.
All models struggled with chemical concepts.
Significant differences in content quality and engagement levels.
Abstract
Early childhood science education is crucial for developing scientific literacy, yet translating complex scientific concepts into age-appropriate content remains challenging for educators. Our study evaluates four leading Large Language Models (LLMs) - GPT-4, Claude, Gemini, and Llama - on their ability to generate preschool-appropriate scientific explanations across biology, chemistry, and physics. Through systematic evaluation by 30 nursery teachers using established pedagogical criteria, we identify significant differences in the models' capabilities to create engaging, accurate, and developmentally appropriate content. Unexpectedly, Claude outperformed other models, particularly in biological topics, while all LLMs struggled with abstract chemical concepts. Our findings provide practical insights for educators leveraging AI in early science education and offer guidance for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation Practices and Evaluation
