Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs
Tristan Williams, Franziska Weeber, Sebastian Pad\'o, Alan Akbik

TL;DR
This paper introduces a new framework for evaluating how well large language models represent human opinions by analyzing multivariate correlation patterns, revealing limitations of current alignment methods.
Contribution
The authors propose a multivariate correlation-based evaluation framework to assess the representativeness of aligned language models beyond marginal distributions.
Findings
Demonstrated that demographic fine-tuning better matches marginal response distributions.
Showed persona prompting better reproduces correlation structures between survey items.
Found that neither technique fully aligns with human correlation patterns.
Abstract
Large language models are increasingly used to represent human opinions, values, or beliefs, and their steerability towards these ideals is an active area of research. Existing work focuses predominantly on aligning marginal response distributions, treating each alignment evaluation example independently. While essential, this may overlook deeper latent structures that characterise real populations and underpin cultural values theories. We propose a framework for evaluating the \textit{representativeness} of aligned models through multivariate correlation patterns in addition to marginal distributions. We show the value of our evaluation scheme by comparing two model steering techniques (persona prompting and demographic fine-tuning) and evaluating them against human responses from the World Values Survey. While the demographic fine-tuned model better approximates marginal response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
