Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones
Antonio Gomez

TL;DR
This paper introduces a novel speaker identification method in noisy classroom audio using virtual microphones simulated from video data, outperforming existing cloud diarization services without requiring large training datasets.
Contribution
It presents a new approach leveraging virtual microphones and cross-correlation patterns for speaker identification in single-channel classroom recordings.
Findings
Outperforms Google Cloud and AWS diarization services.
Effective in noisy, multi-speaker classroom environments.
Does not require prior large datasets for training.
Abstract
Speaker identification in noisy audio recordings, specifically those from collaborative learning environments, can be extremely challenging. There is a need to identify individual students talking in small groups from other students talking at the same time. To solve the problem, we assume the use of a single microphone per student group without any access to previous large datasets for training. This dissertation proposes a method of speaker identification using cross-correlation patterns associated to an array of virtual microphones, centered around the physical microphone. The virtual microphones are simulated by using approximate speaker geometry observed from a video recording. The patterns are constructed based on estimates of the room impulse responses for each virtual microphone. The correlation patterns are then used to identify the speakers. The proposed method is validated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
