Spanish and English Phoneme Recognition by Training on Simulated   Classroom Audio Recordings of Collaborative Learning Environments

Mario Esparza

arXiv:2202.10536·eess.AS·February 23, 2022

Spanish and English Phoneme Recognition by Training on Simulated Classroom Audio Recordings of Collaborative Learning Environments

Mario Esparza

PDF

Open Access 1 Repo

TL;DR

This paper presents a method for recognizing Spanish and English phonemes in noisy collaborative learning environments using simulated data and a low-complexity neural network, achieving competitive results with less real data.

Contribution

It introduces a novel simulated dataset generation approach and a lightweight neural network for bilingual phoneme recognition in noisy settings.

Findings

01

Achieved 0.099 PER on English phonemes with 41 phonemes.

02

Achieved 0.7208 LER on real Spanish recordings, slightly better than Google's model.

03

Used significantly less real data compared to state-of-the-art models.

Abstract

Audio recordings of collaborative learning environments contain a constant presence of cross-talk and background noise. Dynamic speech recognition between Spanish and English is required in these environments. To eliminate the standard requirement of large-scale ground truth, the thesis develops a simulated dataset by transforming audio transcriptions into phonemes and using 3D speaker geometry and data augmentation to generate an acoustic simulation of Spanish and English speech. The thesis develops a low-complexity neural network for recognizing Spanish and English phonemes (available at github.com/muelitas/keywordRec). When trained on 41 English phonemes, 0.099 PER is achieved on Speech Commands. When trained on 36 Spanish phonemes and tested on real recordings of collaborative learning environments, a 0.7208 LER is achieved. Slightly better than Google's Speech-to-text 0.7272 LER,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

muelitas/keywordrec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing