Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu, Ana-Cristina Rogoz, Mihai Sorin Stupariu, Radu Tudor Ionescu

TL;DR
This survey reviews techniques for adapting large multimodal models to low-resource languages, highlighting the importance of visual information and identifying key challenges like hallucination and efficiency.
Contribution
It provides a comprehensive categorization and analysis of 117 studies on LMM adaptation for low-resource languages, offering insights into current methods and challenges.
Findings
Visual information enhances model performance in LR settings
Significant challenges include hallucination and computational efficiency
Resource and method-oriented approaches are systematically categorized
Abstract
In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 117 studies across 96 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We categorize works into resource-oriented and method-oriented contributions, further dividing contributions into relevant sub-categories. We compare method-oriented contributions in terms of performance and efficiency, discussing benefits and limitations of representative studies. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
