Who Said What WSW 2.0? Enhanced Automated Analysis of Preschool Classroom Speech

Anchen Sun; Tiantian Feng; Gabriela Gutierrez; Juan J Londono; Anfeng Xu; Batya Elbaum; Shrikanth Narayanan; Lynn K Perry; Daniel S Messinger

arXiv:2505.09972·eess.AS·October 27, 2025

Who Said What WSW 2.0? Enhanced Automated Analysis of Preschool Classroom Speech

Anchen Sun, Tiantian Feng, Gabriela Gutierrez, Juan J Londono, Anfeng Xu, Batya Elbaum, Shrikanth Narayanan, Lynn K Perry, Daniel S Messinger

PDF

Open Access

TL;DR

This paper presents WSW2.0, an automated framework combining speech recognition and speaker classification to analyze preschool classroom speech with high accuracy, scalability, and potential to advance educational research and interventions.

Contribution

The paper introduces WSW2.0, a novel scalable system integrating wav2vec2 and Whisper models for accurate analysis of classroom speech, outperforming previous methods in both accuracy and scope.

Findings

01

Achieves high speaker classification accuracy with F1 score of .845

02

Demonstrates moderate to high transcription quality with WER of .119 and .238

03

Shows strong agreement with expert annotations across multiple language features

Abstract

This paper introduces an automated framework WSW2.0 for analyzing vocal interactions in preschool classrooms, enhancing both accuracy and scalability through the integration of wav2vec2-based speaker classification and Whisper (large-v2 and large-v3) speech transcription. A total of 235 minutes of audio recordings (160 minutes from 12 children and 75 minutes from 5 teachers), were used to compare system outputs to expert human annotations. WSW2.0 achieves a weighted F1 score of .845, accuracy of .846, and an error-corrected kappa of .672 for speaker classification (child vs. teacher). Transcription quality is moderate to high with word error rates of .119 for teachers and .238 for children. WSW2.0 exhibits relatively high absolute agreement intraclass correlations (ICC) with expert transcriptions for a range of classroom language features. These include teacher and child mean utterance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis