Low-Resource Spoken Language Identification Using Self-Attentive Pooling   and Deep 1D Time-Channel Separable Convolutions

Roman Bedyakin; Nikolay Mikhaylovskiy

arXiv:2106.00052·eess.AS·June 2, 2021

Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions

Roman Bedyakin, Nikolay Mikhaylovskiy

PDF

Open Access

TL;DR

This paper presents a convolutional neural network with self-attentive pooling and deep separable convolutions that achieves state-of-the-art results in low-resource spoken language identification, especially in diverse datasets.

Contribution

It introduces a novel neural network architecture combining self-attentive pooling and deep time-channel separable convolutions for low-resource language ID tasks.

Findings

01

Achieved state-of-the-art results on Low Resource ASR challenge dataset.

02

The model's confusion matrix reflects language similarity in diverse datasets.

03

Self-attentive pooling improves performance in low-resource language identification.

Abstract

This memo describes NTR/TSU winning submission for Low Resource ASR challenge at Dialog2021 conference, language identification track. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. Traditionally, the ASR task requires large volumes of labeled data that are unattainable for most of the world's languages, including most of the languages of Russia. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results in low-resource setting for the language identification task and set up a SOTA for the Low Resource ASR challenge dataset. Additionally, we compare the structure of confusion matrices for this and significantly more diverse VoxForge dataset and state and substantiate the hypothesis that whenever the dataset is diverse enough so that the other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques