Exploring Retrieval Augmented Generation in Arabic
Samhaa R. El-Beltagy, Mohamed A. Abdallah

TL;DR
This paper investigates the application of Retrieval Augmented Generation (RAG) techniques to Arabic, addressing language-specific challenges and evaluating different models to improve Arabic text generation.
Contribution
It provides a comprehensive case study on implementing RAG for Arabic, exploring semantic embeddings and LLMs, and addressing dialect variation issues.
Findings
Semantic embedding models are effective for Arabic RAG.
LLMs can be successfully integrated into Arabic RAG pipelines.
Dialect variation impacts retrieval effectiveness.
Abstract
Recently, Retrieval Augmented Generation (RAG) has emerged as a powerful technique in natural language processing, combining the strengths of retrieval-based and generation-based models to enhance text generation tasks. However, the application of RAG in Arabic, a language with unique characteristics and resource constraints, remains underexplored. This paper presents a comprehensive case study on the implementation and evaluation of RAG for Arabic text. The work focuses on exploring various semantic embedding models in the retrieval stage and several LLMs in the generation stage, in order to investigate what works and what doesn't in the context of Arabic. The work also touches upon the issue of variations between document dialect and query dialect in the retrieval stage. Results show that existing semantic embedding models and LLMs can be effectively employed to build Arabic RAG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · WordPiece · Layer Normalization · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Adam
