CultranAI at PalmX 2025: Data Augmentation for Cultural Knowledge Representation

Hunzalah Hassan Bhatti; Youssef Ahmed; Md Arid Hasan; Firoj Alam

arXiv:2508.17324·cs.CL·October 2, 2025

CultranAI at PalmX 2025: Data Augmentation for Cultural Knowledge Representation

Hunzalah Hassan Bhatti, Youssef Ahmed, Md Arid Hasan, Firoj Alam

PDF

TL;DR

This paper presents CultranAI, a system that uses data augmentation and LoRA fine-tuning of large language models to improve Arabic cultural knowledge representation, achieving competitive results in the PalmX 2025 shared task.

Contribution

The paper introduces a novel approach combining data augmentation and LoRA fine-tuning on large language models for Arabic cultural knowledge representation, with new curated datasets and benchmarking results.

Findings

01

Fanar-1-9B-Instruct achieved highest performance

02

Augmented dataset improved model accuracy

03

System ranked 5th with 70.50% accuracy on blind test

Abstract

In this paper, we report our participation to the PalmX cultural evaluation shared task. Our system, CultranAI, focused on data augmentation and LoRA fine-tuning of large language models (LLMs) for Arabic cultural knowledge representation. We benchmarked several LLMs to identify the best-performing model for the task. In addition to utilizing the PalmX dataset, we augmented it by incorporating the Palm dataset and curated a new dataset of over 22K culturally grounded multiple-choice questions (MCQs). Our experiments showed that the Fanar-1-9B-Instruct model achieved the highest performance. We fine-tuned this model on the combined augmented dataset of 22K+ MCQs. On the blind test set, our submitted system ranked 5th with an accuracy of 70.50%, while on the PalmX development set, it achieved an accuracy of 84.1%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.