Beyond Demographics: Fine-tuning Large Language Models to Predict   Individuals' Subjective Text Perceptions

Matthias Orlikowski; Jiaxin Pei; Paul R\"ottger; Philipp Cimiano,; David Jurgens; Dirk Hovy

arXiv:2502.20897·cs.CL·March 3, 2025

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

Matthias Orlikowski, Jiaxin Pei, Paul R\"ottger, Philipp Cimiano,, David Jurgens, Dirk Hovy

PDF

1 Video

TL;DR

This study investigates whether large language models can be trained to accurately predict individual sociodemographic influences on subjective annotations, revealing limited success in capturing meaningful sociodemographic-behavioral patterns.

Contribution

The paper demonstrates that fine-tuning LLMs improves sociodemographic prompting but mainly captures annotator-specific behavior rather than true sociodemographic patterns.

Findings

01

Models learn annotator-specific behavior rather than sociodemographic patterns.

02

Performance gain is largely due to learning individual annotator behavior.

03

LLMs show limited meaningful connection between sociodemographics and annotations.

Abstract

People naturally vary in their annotations for subjective questions and some of this variation is thought to be due to the person's sociodemographic characteristics. LLMs have also been used to label data, but recent work has shown that models perform poorly when prompted with sociodemographic attributes, suggesting limited inherent sociodemographic knowledge. Here, we ask whether LLMs can be trained to be accurate sociodemographic models of annotator variation. Using a curated dataset of five tasks with standardized sociodemographics, we show that models do improve in sociodemographic prompting when trained but that this performance gain is largely due to models learning annotator-specific behaviour rather than sociodemographic patterns. Across all tasks, our results suggest that models learn little meaningful connection between sociodemographics and annotation, raising doubts about…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions· underline