PersianRAG: A Retrieval-Augmented Generation System for Persian Language
Hossein Hosseini, Mohammad Sobhan Zare, Amir Hossein Mohammadi, Arefeh, Kazemi, Zahra Zojaji, Mohammad Ali Nematbakhsh

TL;DR
PersianRAG introduces a retrieval-augmented generation system tailored for Persian, overcoming language-specific challenges to improve question answering performance in low-resource settings.
Contribution
The paper presents novel solutions for implementing RAG models in Persian, addressing unique preprocessing, retrieval, and evaluation challenges.
Findings
Enhanced question answering accuracy in Persian.
Effective handling of Persian language challenges in RAG models.
Demonstrated improvements on Persian benchmark datasets.
Abstract
Retrieval augmented generation (RAG) models, which integrate large-scale pre-trained generative models with external retrieval mechanisms, have shown significant success in various natural language processing (NLP) tasks. However, applying RAG models in Persian language as a low-resource language, poses distinct challenges. These challenges primarily involve the preprocessing, embedding, retrieval, prompt construction, language modeling, and response evaluation of the system. In this paper, we address the challenges towards implementing a real-world RAG system for Persian language called PersianRAG. We propose novel solutions to overcome these obstacles and evaluate our approach using several Persian benchmark datasets. Our experimental results demonstrate the capability of the PersianRAG framework to enhance question answering task in Persian.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsLinear Layer · Softmax · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · WordPiece · Adam · Attention Is All You Need
