Can Multimodal LLMs See Science Instruction? Benchmarking Pedagogical Reasoning in K-12 Classroom Videos
Yixuan Shen, Peng He, Honglu Liu, Jinxuan Fan, Yuyang Ji, Tingting Li, Tianlong Chen, Kaidi Xu, and Feng Liu

TL;DR
This paper introduces SciIBI, a novel video benchmark for analyzing K-12 science classroom discourse, revealing current multimodal LLMs' limitations in understanding pedagogical reasoning and emphasizing the need for human-AI collaboration.
Contribution
The paper presents SciIBI, the first video benchmark for science classroom discourse, and evaluates state-of-the-art LLMs, highlighting their struggles with pedagogical reasoning and the inconsistent impact of video input.
Findings
Models often rely on surface shortcuts rather than true understanding.
Adding video input yields inconsistent improvements across architectures.
Current models cannot reliably distinguish pedagogically similar practices.
Abstract
K-12 science classrooms are rich sites of inquiry where students coordinate phenomena, evidence, and explanatory models through discourse; yet, the multimodal complexity of these interactions has made automated analysis elusive. Existing benchmarks for classroom discourse focus primarily on mathematics and rely solely on transcripts, overlooking the visual artifacts and model-based reasoning emphasized by the Next Generation Science Standards (NGSS). We address this gap with SciIBI, the first video benchmark for analyzing science classroom discourse, featuring 113 NGSS-aligned clips annotated with Core Instructional Practices (CIP) and sophistication levels. By evaluating eight state-of-the-art LLMs and Multimodal LLMs, we reveal fundamental limitations: current models struggle to distinguish pedagogically similar practices, suggesting that CIP coding requires instructional reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScience Education and Pedagogy · Intelligent Tutoring Systems and Adaptive Learning · Teaching and Learning Programming
