Fine-Grained Classroom Activity Detection from Audio with Neural Networks
Eric Slyman, Chris Daw, Morgan Skrabut, Ana Usenko, Brian Hutchinson

TL;DR
This paper develops and evaluates neural network models for automatic detection of classroom activities from audio recordings, achieving state-of-the-art accuracy and robust generalization across instructors and activity types.
Contribution
It introduces deep neural architectures and feature comparisons for fine-grained classroom activity detection, advancing the automation of classroom behavior analysis.
Findings
Achieved 6.2% frame-level error rate on 4-way classification for unseen instructors.
Reduced aggregate time estimation error by 54.9% compared to baseline.
Demonstrated robust generalization across different instructors and activity classes.
Abstract
Instructors are increasingly incorporating student-centered learning techniques in their classrooms to improve learning outcomes. In addition to lecture, these class sessions involve forms of individual and group work, and greater rates of student-instructor interaction. Quantifying classroom activity is a key element of accelerating the evaluation and refinement of innovative teaching practices, but manual annotation does not scale. In this manuscript, we present advances to the young application area of automatic classroom activity detection from audio. Using a university classroom corpus with nine activity labels (e.g., "lecture," "group work," "student question"), we propose and evaluate deep fully connected, convolutional, and recurrent neural network architectures, comparing the performance of mel-filterbank, OpenSmile, and self-supervised acoustic features. We compare 9-way…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Communication in Education and Healthcare · Video Analysis and Summarization
