Large Language Models and Arabic Content: A Review

Haneh Rhel; Dmitri Roussinov

arXiv:2505.08004·cs.CL·May 14, 2025

Large Language Models and Arabic Content: A Review

Haneh Rhel, Dmitri Roussinov

PDF

TL;DR

This paper reviews the application of large language models to Arabic NLP, discussing models, techniques, datasets, and challenges, highlighting recent progress and ongoing research efforts in this linguistically complex language.

Contribution

It provides a comprehensive overview of Arabic LLMs, including models, techniques, benchmarks, and challenges, emphasizing recent advancements and future directions.

Findings

01

Arabic LLMs achieve significant success in NLP tasks.

02

Fine-tuning and prompt engineering improve model performance.

03

Growing adoption of LLMs in Arabic NLP applications.

Abstract

Over the past three years, the rapid advancement of Large Language Models (LLMs) has had a profound impact on multiple areas of Artificial Intelligence (AI), particularly in Natural Language Processing (NLP) across diverse languages, including Arabic. Although Arabic is considered one of the most widely spoken languages across 27 countries in the Arabic world and used as a second language in some other non-Arabic countries as well, there is still a scarcity of Arabic resources, datasets, and tools. Arabic NLP tasks face various challenges due to the complexities of the Arabic language, including its rich morphology, intricate structure, and diverse writing standards, among other factors. Researchers have been actively addressing these challenges, demonstrating that pre-trained Large Language Models (LLMs) trained on multilingual corpora achieve significant success in various Arabic NLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.