AIN: The Arabic INclusive Large Multimodal Model
Ahmed Heakl, Sara Ghaboura, Omkar Thawkar, Fahad Shahbaz Khan, Hisham, Cholakkal, Rao Muhammad Anwer, Salman Khan

TL;DR
The paper introduces AIN, a large bilingual Arabic-English multimodal model that achieves state-of-the-art performance in Arabic and demonstrates strong capabilities across diverse visual and language understanding tasks, bridging a significant gap in Arabic multimodal AI.
Contribution
The paper presents AIN, the first large-scale Arabic-inclusive multimodal model trained on 3.6 million high-quality samples, outperforming existing models like GPT-4o across multiple domains.
Findings
AIN outperforms GPT-4o by 3.4% on CAMEL-Bench.
AIN demonstrates strong performance in 38 sub-domains including medical imaging and remote sensing.
AIN achieves state-of-the-art results in Arabic multimodal understanding.
Abstract
Amid the swift progress of large language models (LLMs) and their evolution into large multimodal models (LMMs), significant strides have been made in high-resource languages such as English and Chinese. While Arabic LLMs have seen notable progress, Arabic LMMs remain largely unexplored, often narrowly focusing on a few specific aspects of the language and visual understanding. To bridge this gap, we introduce AIN-the Arabic Inclusive Multimodal Model-designed to excel across diverse domains. AIN is an English-Arabic bilingual LMM designed to excel in English and Arabic, leveraging carefully constructed 3.6 million high-quality Arabic-English multimodal data samples. AIN demonstrates state-of-the-art Arabic performance, while also possessing strong English-language visual capabilities. On the recent CAMEL-Bench benchmark comprising 38 sub-domains including, multi-image understanding,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
