Exploring Automated Recognition of Instructional Activity and Discourse from Multimodal Classroom Data
Ivo Bueno, Ruikun Hou, Babette B\"uhler, Tim F\"utterer, James Drimalla, Jonathan Kyle Foster, Peter Youngs, Peter Gerjets, Ulrich Trautwein, Enkelejda Kasneci

TL;DR
This paper develops AI methods to automatically recognize instructional activities and discourse in classroom videos and transcripts, aiming to enable scalable teacher feedback systems.
Contribution
It introduces modality-specific pipelines using fine-tuned models for multimodal classroom data, outperforming prompting-based approaches in recognizing instructional activities and discourse.
Findings
Fine-tuned models achieve macro-F1 scores of 0.577 (video) and 0.460 (transcripts).
Automated analysis is feasible for classroom interaction recognition.
The approach supports scalable teacher feedback systems.
Abstract
Observation of classroom interactions can provide concrete feedback to teachers, but current methods rely on manual annotation, which is resource-intensive and hard to scale. This work explores AI-driven analysis of classroom recordings, focusing on multimodal instructional activity and discourse recognition as a foundation for actionable feedback. Using a densely annotated dataset of 164 hours of video and 68 lesson transcripts, we design parallel, modality-specific pipelines. For video, we evaluate zero-shot multimodal LLMs, fine-tuned vision-language models, and self-supervised video transformers on 24 activity labels. For transcripts, we fine-tune a transformer-based classifier with contextualized inputs and compare it against prompting-based LLMs on 19 discourse labels. To handle class imbalance and multi-label complexity, we apply per-label thresholding, context windows, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Emotion and Mood Recognition
