Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic
Fakhraddin Alwajih, Gagan Bhatia, Muhammad Abdul-Mageed

TL;DR
Dallah is a novel Arabic multimodal large language model that leverages LLaMA-2, fine-tuned on six dialects, to improve multimodal understanding and generation in Arabic, addressing resource scarcity and dialectal complexity.
Contribution
Introducing Dallah, the first efficient dialect-aware Arabic multimodal LLM based on LLaMA-2, with state-of-the-art performance on Arabic multimodal benchmarks.
Findings
State-of-the-art performance on Arabic MLLM benchmarks
Effective handling of six Arabic dialects in multimodal tasks
Robust responses for both Modern Standard Arabic and dialects
Abstract
Recent advancements have significantly enhanced the capabilities of Multimodal Large Language Models (MLLMs) in generating and understanding image-to-text content. Despite these successes, progress is predominantly limited to English due to the scarcity of high quality multimodal resources in other languages. This limitation impedes the development of competitive models in languages such as Arabic. To alleviate this situation, we introduce an efficient Arabic multimodal assistant, dubbed Dallah, that utilizes an advanced language model based on LLaMA-2 to facilitate multimodal interactions. Dallah demonstrates state-of-the-art performance in Arabic MLLMs. Through fine-tuning six Arabic dialects, Dallah showcases its capability to handle complex dialectal interactions incorporating both textual and visual elements. The model excels in two benchmark tests: one evaluating its performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Topic Modeling
