The Obscure Limitation of Modular Multilingual Language Models

Muhammad Farid Adilazuarda; Samuel Cahyawijaya; Ayu Purwarianti

arXiv:2311.12375·cs.CL·November 22, 2023·2 cites

The Obscure Limitation of Modular Multilingual Language Models

Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Ayu Purwarianti

PDF

Open Access

TL;DR

This paper investigates the limitations of modular multilingual language models in real-world scenarios involving unknown languages, highlighting the impact of language identification modules on their performance.

Contribution

It reveals the performance gap caused by LID modules in modular MLMs and discusses potential ways to address this issue.

Findings

01

LID modules affect multilingual inference performance

02

Existing evaluations overlook LID impact

03

Discussion on closing the performance gap

Abstract

We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages. Existing evaluations of modular MLMs exclude the involvement of language identification (LID) modules, which obscures the performance of real-case multilingual scenarios of modular MLMs. In this work, we showcase the effect of adding LID on the multilingual evaluation of modular MLMs and provide discussions for closing the performance gap of caused by the pipelined approach of LID and modular MLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis