Missing the Margins: A Systematic Literature Review on the Demographic Representativeness of LLMs
Indira Sen, Marlene Lutz, Elisa Rogers, David Garcia, and Markus Strohmaier

TL;DR
This systematic review examines the extent to which Large Language Models reflect demographic diversity, revealing many studies lack comprehensive evaluation and highlighting the need for improved assessment methods for equitable social use.
Contribution
It provides a comprehensive analysis of 211 papers on LLM demographic representativeness, exposing gaps and inconsistencies in current evaluation practices.
Findings
29% of studies report positive representativeness
Many studies lack evaluation across multiple demographic categories
Over a third do not define their target populations
Abstract
Many applications of Large Language Models (LLMs) require them to either simulate people or offer personalized functionality, making the demographic representativeness of LLMs crucial for equitable utility. At the same time, we know little about the extent to which these models actually reflect the demographic attributes and behaviors of certain groups or populations, with conflicting findings in empirical research. To shed light on this debate, we review 211 papers on the demographic representativeness of LLMs. We find that while 29% of the studies report positive conclusions on the representativeness of LLMs, 30% of these do not evaluate LLMs across multiple demographic categories or within demographic subcategories. Another 35% and 47% of the papers concluding positively fail to specify these subcategories altogether for gender and race, respectively. Of the articles that do report…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods · Topic Modeling
