Fairness in LLM-Generated Surveys

Andr\'es Abeliuk; Vanessa Gaete; Naim Bro

arXiv:2501.15351·cs.CY·January 28, 2025

Fairness in LLM-Generated Surveys

Andr\'es Abeliuk, Vanessa Gaete, Naim Bro

PDF

Open Access

TL;DR

This paper investigates biases in Large Language Models when used for generating surveys across different socio-demographic and geographic groups, revealing performance disparities rooted in training data biases and proposing a framework for fairness assessment.

Contribution

It introduces a novel framework for measuring socio-demographic biases in LLMs and highlights the importance of cross-cultural fairness in survey applications.

Findings

01

LLMs perform better on U.S. datasets due to training data bias.

02

Political identity and race affect prediction accuracy in the U.S.

03

Gender, education, and religion influence performance in Chile.

Abstract

Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQualitative Comparative Analysis Research