MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

Yufei Gao; Jiaying Fei; Nuo Chen; Ruirui Chen; Guohang Yan; Yunshi Lan; Botian Shi

arXiv:2508.05502·cs.CV·December 10, 2025

MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs

Yufei Gao, Jiaying Fei, Nuo Chen, Ruirui Chen, Guohang Yan, Yunshi Lan, Botian Shi

PDF

TL;DR

This paper introduces MELLA, a dataset designed to improve low-resource multilingual models by enhancing both linguistic skills and cultural awareness through targeted data collection and fine-tuning.

Contribution

The study presents MELLA, a novel multimodal, multilingual dataset that boosts low-resource language performance by integrating cultural groundedness and linguistic capability.

Findings

01

Performance improved across eight languages after fine-tuning on MELLA.

02

Models produce more detailed 'thick descriptions' with cultural context.

03

Enhancement is due to both cultural knowledge and linguistic capability gains.

Abstract

Multimodal Large Language Models (MLLMs) have shown remarkable performance in high-resource languages. However, their effectiveness diminishes significantly in the contexts of low-resource languages. Current multilingual enhancement methods are often limited to text modality or rely solely on machine translation. While such approaches help models acquire basic linguistic capabilities and produce "thin descriptions", they neglect the importance of multimodal informativeness and cultural groundedness, both of which are crucial for serving low-resource language users effectively. To bridge this gap, in this study, we identify two significant objectives for a truly effective MLLM in low-resource language settings, namely 1) linguistic capability and 2) cultural groundedness, placing special emphasis on cultural awareness. To achieve these dual objectives, we propose a dual-source strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.