Progressing beyond Art Masterpieces or Touristic Clich\'es: how to assess your LLMs for cultural alignment?
Ant\'onio Branco, Jo\~ao Silva, Nuno Marques, Luis Gomes, Ricardo Campos, Raquel Sequeira, Sara Nerea, Rodrigo Silva, Miguel Marques, Rodrigo Duarte, Artur Putyato, Diogo Folques, Tiago Valente

TL;DR
This paper reviews current methods for assessing cultural alignment in LLMs, proposes improved dataset design guidelines, and demonstrates their effectiveness through contrastive experiments.
Contribution
It introduces new design guidelines for cultural assessment datasets and shows these improve discrimination between culturally specialized and general models.
Findings
New dataset design yields greater discriminative power.
Contrastive experiments distinguish culturally specialized models.
Proposed guidelines address limitations of existing datasets.
Abstract
Although the cultural (mis)alignment of Large Language Models (LLMs) has attracted increasing attention -- often framed in terms of cultural bias -- until recently there has been limited work on the design and development of datasets for cultural assessment. Here, we review existing approaches to such datasets and identify their main limitations. To address these issues, we propose design guidelines for annotators and report on the construction of a dataset built according to these principles. We further present a series of contrastive experiments conducted with this dataset. The results demonstrate that our design yields test sets with greater discriminative power, effectively distinguishing between models specialized for a given culture and those that are not, ceteris paribus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
