Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact
Junhua Liu, Bin Fu

TL;DR
This paper provides a comprehensive framework for developing and deploying multilingual large language models, addressing technical, linguistic, and societal challenges to promote inclusive AI across diverse languages.
Contribution
It offers an end-to-end development pipeline, detailed optimization strategies using Llama2, and an interdisciplinary analysis of multilingual AI development.
Findings
88.38% of world languages are low-resource, impacting over a billion speakers.
Practical solutions are examined through applications like customer service and machine translation.
The survey synthesizes theoretical and practical insights for inclusive multilingual AI development.
Abstract
Multilingual Large Language Models (MLLMs) represent a pivotal advancement in democratizing artificial intelligence across linguistic boundaries. While theoretical foundations are well-established, practical implementation guidelines remain scattered. This work bridges this gap by providing a comprehensive end-to-end framework for developing and deploying MLLMs in production environments. We make three distinctive contributions: First, we present an actionable pipeline from data pre-processing through deployment, integrating insights from academic research and industrial applications. Second, using Llama2 as a case study, we provide detailed optimization strategies for enhancing multilingual capabilities, including curriculum learning approaches for balancing high-resource and low-resource languages, tokenization strategies, and effective sampling methods. Third, we offer an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
Methodstravel james
