Syntactic and Semantic Features For Code-Switching Factored Language Models
Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, Dominic Telaar, Tanja, Schultz

TL;DR
This paper investigates the use of syntactic and semantic features in factored language models to improve code-switching speech recognition, achieving significant reductions in perplexity and error rates.
Contribution
It introduces the integration of various syntactic and semantic features, such as Brown word clusters and open-class word embeddings, into factored language models for code-switching speech.
Findings
Brown word clusters and POS tags significantly reduce perplexity.
Models with open-class word embeddings improve error rates.
Up to 10.8% perplexity reduction and 3.4% error rate improvement.
Abstract
This paper presents our latest investigations on different features for factored language models for Code-Switching speech and their effect on automatic speech recognition (ASR) performance. We focus on syntactic and semantic features which can be extracted from Code-Switching text data and integrate them into factored language models. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and clusters of open class word embeddings are explored. The experimental results reveal that Brown word clusters, part-of-speech tags and open-class words are the most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME. In ASR experiments, the model containing Brown word clusters and part-of-speech tags and the model also including clusters of open class word embeddings yield the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
