Methods of Automatic Matrix Language Determination for Code-Switched   Speech

Olga Iakovenko; Thomas Hain

arXiv:2410.02521·cs.CL·December 3, 2024

Methods of Automatic Matrix Language Determination for Code-Switched Speech

Olga Iakovenko, Thomas Hain

PDF

1 Video

TL;DR

This paper introduces methods for automatic determination of the Matrix Language in code-switched speech using acoustic and textual cues, improving identification accuracy and revealing language preferences.

Contribution

It develops MLID systems based on MLF theory for code-switched speech, outperforming traditional acoustic LID in accuracy and correlation.

Findings

01

MLID predictors outperform LID in accuracy and correlation.

02

Non-English languages are preferred as the Matrix Language in code-switching.

03

The approach enhances understanding of language structure in code-switched speech.

Abstract

Code-switching (CS) is the process of speakers interchanging between two or more languages which in the modern world becomes increasingly common. In order to better describe CS speech the Matrix Language Frame (MLF) theory introduces the concept of a Matrix Language, which is the language that provides the grammatical structure for a CS utterance. In this work the MLF theory was used to develop systems for Matrix Language Identity (MLID) determination. The MLID of English/Mandarin and English/Spanish CS text and speech was compared to acoustic language identity (LID), which is a typical way to identify a language in monolingual utterances. MLID predictors from audio show higher correlation with the textual principles than LID in all cases while also outperforming LID in an MLID recognition task based on F1 macro (60%) and correlation score (0.38). This novel approach has identified that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Methods of Automatic Matrix Language Determination for Code-Switched Speech· underline