Transducer-based language embedding for spoken language identification

Peng Shen; Xugang Lu; Hisashi Kawai

arXiv:2204.03888·cs.CL·August 1, 2022

Transducer-based language embedding for spoken language identification

Peng Shen, Xugang Lu, Hisashi Kawai

PDF

Open Access

TL;DR

This paper introduces a transducer-based language embedding method that combines acoustic and linguistic features to improve spoken language identification accuracy, demonstrating significant performance gains on large multilingual datasets.

Contribution

The paper presents a novel RNN transducer-based language embedding approach that explicitly encodes linguistic features for enhanced LID performance.

Findings

01

Significant accuracy improvements on in-domain datasets.

02

Notable performance gains on cross-domain datasets.

03

Effective integration of phonetic and linguistic features.

Abstract

The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper, we propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. Experiments were carried out on the large-scale multilingual LibriSpeech and VoxLingua107 datasets. Experimental results showed the proposed method significantly improves the performance on LID tasks with 12% to 59% and 16% to 24% relative improvement on in-domain and cross-domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques