Subspace-based Representation and Learning for Phonotactic Spoken   Language Recognition

Hung-Shin Lee; Yu Tsao; Shyh-Kang Jeng; Hsin-Min Wang

arXiv:2203.15576·cs.SD·March 30, 2022

Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition

Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, Hsin-Min Wang

PDF

Open Access

TL;DR

This paper introduces a novel subspace-based learning framework for phonotactic language recognition, effectively capturing concealed phonotactic structures to improve language and dialect identification accuracy.

Contribution

It proposes a new subspace construction and learning method using kernel machines and neural networks, enhancing phonotactic language recognition performance.

Findings

01

Achieved up to 56% relative EER reduction on NIST LRE 2007

02

Outperformed baseline methods on dialect/accent identification

03

Demonstrated effectiveness of subspace-based neural networks

Abstract

Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events. In the present study, we propose a new learning mechanism based on subspace-based representation, which can extract concealed phonotactic structures from utterances, for language verification and dialect/accent identification. The framework mainly involves two successive parts. The first part involves subspace construction. Specifically, it decodes each utterance into a sequence of vectors filled with phone-posteriors and transforms the vector sequence into a linear orthogonal subspace based on low-rank matrix factorization or dynamic linear modeling. The second part involves subspace learning based on kernel machines, such as support vector machines and the newly developed subspace-based neural networks (SNNs). The input layer of SNNs is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing