Probing Cultural Signals in Large Language Models through Author Profiling

Valentin Lafargue; Ariel Guerra-Adames; Emmanuelle Claeys; Elouan Vuichard; Jean-Michel Loubes

arXiv:2603.16749·cs.CL·March 20, 2026

Probing Cultural Signals in Large Language Models through Author Profiling

Valentin Lafargue, Ariel Guerra-Adames, Emmanuelle Claeys, Elouan Vuichard, Jean-Michel Loubes

PDF

Open Access 1 Datasets

TL;DR

This paper investigates cultural biases in large language models by evaluating their ability to perform author profiling on song lyrics, revealing systematic ethnic biases and proposing fairness metrics.

Contribution

It introduces a novel zero-shot author profiling method for LLMs on song lyrics and quantifies cultural biases using new fairness metrics.

Findings

01

LLMs achieve non-trivial profiling accuracy

02

Models show systematic North American bias

03

DeepSeek-1.5B aligns more with Asian ethnicity

Abstract

Large language models (LLMs) are increasingly deployed in applications with societal impact, raising concerns about the cultural biases they encode. We probe these representations by evaluating whether LLMs can perform author profiling from song lyrics in a zero-shot setting, inferring singers' gender and ethnicity without task-specific fine-tuning. Across several open-source models evaluated on more than 10,000 lyrics, we find that LLMs achieve non-trivial profiling performance but demonstrate systematic cultural alignment: most models default toward North American ethnicity, while DeepSeek-1.5B aligns more strongly with Asian ethnicity. This finding emerges from both the models' prediction distributions and an analysis of their generated rationales. To quantify these disparities, we introduce two fairness metrics, Modality Accuracy Divergence (MAD) and Recall Divergence (RD), and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ValentinLAFARGUE/AuthorProfilingResults
dataset· 37 dl
37 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Computational and Text Analysis Methods · Hate Speech and Cyberbullying Detection