CultranAI at PalmX 2025: Data Augmentation for Cultural Knowledge Representation
Hunzalah Hassan Bhatti, Youssef Ahmed, Md Arid Hasan, Firoj Alam

TL;DR
This paper presents CultranAI, a system that uses data augmentation and LoRA fine-tuning of large language models to improve Arabic cultural knowledge representation, achieving competitive results in the PalmX 2025 shared task.
Contribution
The paper introduces a novel approach combining data augmentation and LoRA fine-tuning on large language models for Arabic cultural knowledge representation, with new curated datasets and benchmarking results.
Findings
Fanar-1-9B-Instruct achieved highest performance
Augmented dataset improved model accuracy
System ranked 5th with 70.50% accuracy on blind test
Abstract
In this paper, we report our participation to the PalmX cultural evaluation shared task. Our system, CultranAI, focused on data augmentation and LoRA fine-tuning of large language models (LLMs) for Arabic cultural knowledge representation. We benchmarked several LLMs to identify the best-performing model for the task. In addition to utilizing the PalmX dataset, we augmented it by incorporating the Palm dataset and curated a new dataset of over 22K culturally grounded multiple-choice questions (MCQs). Our experiments showed that the Fanar-1-9B-Instruct model achieved the highest performance. We fine-tuned this model on the combined augmented dataset of 22K+ MCQs. On the blind test set, our submitted system ranked 5th with an accuracy of 70.50%, while on the PalmX development set, it achieved an accuracy of 84.1%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
