Automatic Spoken Language Identification using a Time-Delay Neural   Network

Benjamin Kepecs; Homayoon Beigi

arXiv:2205.09564·cs.CL·May 20, 2022

Automatic Spoken Language Identification using a Time-Delay Neural Network

Benjamin Kepecs, Homayoon Beigi

PDF

TL;DR

This paper presents a TDNN-based system for automatic spoken language identification that achieves high accuracy for some languages using a custom multilingual model and voting scheme.

Contribution

It introduces a TDNN acoustic model with a specialized pronunciation lexicon for language identification from speech recordings.

Findings

01

High accuracy in identifying Spanish and Arabic

02

Moderate accuracy for Turkish and French

03

Effective use of phone alignments and voting scheme

Abstract

Closed-set spoken language identification is the task of recognizing the language being spoken in a recorded audio clip from a set of known languages. In this study, a language identification system was built and trained to distinguish between Arabic, Spanish, French, and Turkish based on nothing more than recorded speech. A pre-existing multilingual dataset was used to train a series of acoustic models based on the Tedlium TDNN model to perform automatic speech recognition. The system was provided with a custom multilingual language model and a specialized pronunciation lexicon with language names prepended to phones. The trained model was used to generate phone alignments to test data from all four languages, and languages were predicted based on a voting scheme choosing the most common language prepend in an utterance. Accuracy was measured by comparing predicted languages to known…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training