The Obscure Limitation of Modular Multilingual Language Models
Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Ayu Purwarianti

TL;DR
This paper investigates the limitations of modular multilingual language models in real-world scenarios involving unknown languages, highlighting the impact of language identification modules on their performance.
Contribution
It reveals the performance gap caused by LID modules in modular MLMs and discusses potential ways to address this issue.
Findings
LID modules affect multilingual inference performance
Existing evaluations overlook LID impact
Discussion on closing the performance gap
Abstract
We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages. Existing evaluations of modular MLMs exclude the involvement of language identification (LID) modules, which obscures the performance of real-case multilingual scenarios of modular MLMs. In this work, we showcase the effect of adding LID on the multilingual evaluation of modular MLMs and provide discussions for closing the performance gap of caused by the pipelined approach of LID and modular MLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
