Multi-Scale Temporal Convolution Network for Classroom Voice Detection

Lu Ma; Xintian Wang; Song Yang; Yaguang Gong; Zhongqin Wu

arXiv:2105.14717·cs.SD·June 1, 2021·1 cites

Multi-Scale Temporal Convolution Network for Classroom Voice Detection

Lu Ma, Xintian Wang, Song Yang, Yaguang Gong, Zhongqin Wu

PDF

Open Access

TL;DR

This paper introduces a multi-scale temporal convolution network for classifying classroom voice signals into four categories, improving the extraction of assistant teacher voices amidst interference for better downstream speech processing.

Contribution

It proposes a novel multi-scale temporal convolution neural network with dilated convolutions for frame-level sound event detection in classroom environments, enhancing voice classification accuracy.

Findings

01

High precision and recall on simulated data

02

Effective in real-world classroom recordings

03

Outperforms classical classification methods

Abstract

Teaching with the cooperation of expert teacher and assistant teacher, which is the so-called "double-teachers classroom", i.e., the course is giving by the expert online and presented through projection screen at the classroom, and the teacher at the classroom performs as an assistant for guiding the students in learning, is becoming more prevalent in today's teaching method for K-12 education. For monitoring the teaching quality, a microphone clipped on the assistant's neckline is always used for voice recording, then fed to the downstream tasks of automatic speech recognition (ASR) and neural language processing (NLP). However, besides its voice, there would be some other interfering voices, including the expert's one and the student's one. Here, we propose to extract the assistant' voices from the perspective of sound event detection, i.e., the voices are classified into four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hand Gesture Recognition Systems