BERT-LID: Leveraging BERT to Improve Spoken Language Identification
Yuting Nie, Junhong Zhao, Wei-Qiang Zhang, Jinfeng Bai

TL;DR
This paper introduces BERT-LID, a BERT-based system that uses phonetic posteriorgrams to significantly improve language identification accuracy, especially for short speech segments, enhancing multilingual speech system interoperability.
Contribution
It extends BERT with phonetic posteriorgrams for better short-utterance language identification, achieving notable accuracy improvements over baseline methods.
Findings
6.5% accuracy improvement on long segments
19.9% accuracy improvement on short segments
Effective enhancement of language ID performance for short speech
Abstract
Language identification is the task of automatically determining the identity of a language conveyed by a spoken segment. It has a profound impact on the multilingual interoperability of an intelligent speech system. Despite language identification attaining high accuracy on medium or long utterances(>3s), the performance on short utterances (<=1s) is still far from satisfactory. We propose a BERT-based language identification system (BERT-LID) to improve language identification performance, especially on short-duration speech segments. We extend the original BERT model by taking the phonetic posteriorgrams (PPG) derived from the front-end phone recognizer as input. Then we deployed the optimal deep classifier followed by it for language identification. Our BERT-LID model can improve the baseline accuracy by about 6.5% on long-segment identification and 19.9% on short-segment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Sigmoid Activation · WordPiece · Residual Connection · Layer Normalization · Dropout
