How well do Large Language Models Recognize Instructional Moves? Establishing Baselines for Foundation Models in Educational Discourse
Kirk Vanacore, Rene F. Kizilcec

TL;DR
This study evaluates how well large language models can classify instructional moves in educational transcripts without customization, revealing moderate performance that improves with prompt design but remains limited in reliability.
Contribution
It provides the first baseline assessment of foundation models' ability to interpret authentic educational discourse using standard prompting methods.
Findings
Few-shot prompting improves classification performance.
Performance varies significantly across different instructional moves.
Higher recall often increases false positives.
Abstract
Large language models (LLMs) are increasingly adopted in educational technologies for a variety of tasks, from generating instructional materials and assisting with assessment design to tutoring. While prior work has investigated how models can be adapted or optimized for specific tasks, far less is known about how well LLMs perform at interpreting authentic educational scenarios without significant customization. As LLM-based systems become widely adopted by learners and educators in everyday academic contexts, understanding their out-of-the-box capabilities is increasingly important for setting expectations and benchmarking. We compared six LLMs to estimate their baseline performance on a simple but important task: classifying instructional moves in authentic classroom transcripts. We evaluated typical prompting methods: zero-shot, one-shot, and few-shot prompting. We found that while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics · Text Readability and Simplification
