AlignCultura: Towards Culturally Aligned Large Language Models?

Gautam Siddharth Kashyap; Mark Dras; Usman Naseem

arXiv:2604.19016·cs.CL·April 22, 2026

AlignCultura: Towards Culturally Aligned Large Language Models?

Gautam Siddharth Kashyap, Mark Dras, Usman Naseem

PDF

1 Datasets

TL;DR

This paper introduces AlignCultura, a two-stage pipeline and dataset for evaluating and improving cultural alignment in large language models based on UNESCO principles.

Contribution

It presents a novel dataset and benchmarking framework for cultural alignment, enabling systematic evaluation aligned with UNESCO's cultural diversity principles.

Findings

01

Culturally fine-tuned models improve HHH scores by 4%-6%.

02

Cultural failures are reduced by 18% in fine-tuned models.

03

Leakage in cultural responses is limited to 0.3%.

Abstract

Cultural alignment in Large Language Models (LLMs) is essential for producing contextually aware, respectful, and trustworthy outputs. Without it, models risk generating stereotyped, insensitive, or misleading responses that fail to reflect cultural diversity w.r.t Helpful, Harmless, and Honest (HHH) paradigm. Existing benchmarks represent early steps toward cultural alignment; yet, no benchmarks currently enables systematic evaluation of cultural alignment in line with UNESCO's principles of cultural diversity w.r.t HHH paradigm. Therefore, to address this gap, we built Align-Cultura, two-stage pipeline for cultural alignment. Stage I constructs CULTURAX, the HHH-English dataset grounded in the UNESCO cultural taxonomy, through Query Construction, which reclassifies prompts, expands underrepresented domains (or labels), and prevents data leakage with SimHash. Then, Response Generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

GautamKashyap/CulturaX
dataset· 37 dl
37 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.