Fanar 2.0: Arabic Generative AI Stack

FANAR TEAM; Ummar Abbas; Mohammad Shahmeer Ahmad; Minhaj Ahmad; Abdulaziz Al-Homaid; Anas Al-Nuaimi; Enes Altinisik; Ehsaneddin Asgari; Sanjay Chawla; Shammur Chowdhury; Fahim Dalvi; Kareem Darwish; Nadir Durrani; Mohamed Elfeky; Ahmed Elmagarmid; Mohamed Eltabakh; Asim Ersoy; Masoomali Fatehkia; Mohammed Qusay Hashim; Majd Hawasly; Mohamed Hefeeda; Mus'ab Husaini; Keivin Isufaj; Soon-Gyo Jung; Houssam Lachemat; Ji Kim Lucas; Abubakr Mohamed; Tasnim Mohiuddin; Basel Mousi; Hamdy Mubarak; Ahmad Musleh; Mourad Ouzzani; Amin Sadeghi; Husrev Taha Sencar; Mohammed Shinoy; Omar Sinan; Yifan Zhang

arXiv:2603.16397·cs.CL·March 18, 2026

Fanar 2.0: Arabic Generative AI Stack

FANAR TEAM, Ummar Abbas, Mohammad Shahmeer Ahmad, Minhaj Ahmad, Abdulaziz Al-Homaid, Anas Al-Nuaimi, Enes Altinisik, Ehsaneddin Asgari, Sanjay Chawla, Shammur Chowdhury, Fahim Dalvi, Kareem Darwish, Nadir Durrani, Mohamed Elfeky, Ahmed Elmagarmid, Mohamed Eltabakh, Asim Ersoy

PDF

4 Models

TL;DR

Fanar 2.0 is a Qatar-developed Arabic-centric Generative AI platform that achieves high performance and diverse capabilities through resource-efficient strategies, sovereign infrastructure, and targeted data curation.

Contribution

It introduces Fanar-27B, a high-quality Arabic language model trained with fewer tokens, and a comprehensive AI stack including safety, speech, vision, and multi-modal tools, all developed sovereignly.

Findings

01

Fanar-27B improves Arabic benchmarks significantly despite fewer training tokens.

02

The platform demonstrates competitive performance with resource-constrained training.

03

New capabilities include bilingual moderation, speech recognition, image understanding, and multi-agent workflows.

Abstract

We present Fanar 2.0, the second generation of Qatar's Arabic-centric Generative AI platform. Sovereignty is a first-class design principle: every component, from data pipelines to deployment infrastructure, was designed and operated entirely at QCRI, Hamad Bin Khalifa University. Fanar 2.0 is a story of resource-constrained excellence: the effort ran on 256 NVIDIA H100 GPUs, with Arabic having only ~0.5% of web data despite 400 million native speakers. Fanar 2.0 adopts a disciplined strategy of data quality over quantity, targeted continual pre-training, and model merging to achieve substantial gains within these constraints. At the core is Fanar-27B, continually pre-trained from a Gemma-3-27B backbone on a curated corpus of 120 billion high-quality tokens across three data recipes. Despite using 8x fewer pre-training tokens than Fanar 1.0, it delivers substantial benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.