Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text

Paul Jackson (1); Ruizhe Li (2); and Elspeth Edelstein (3) ((1) Language Centre; School of Language; Literature; Music; Visual Culture; University of Aberdeen; United Kingdom; (2) School of Natural; Computing Sciences; University of Aberdeen; United Kingdom; (3) School of Language; Literature; Music; Visual Culture; University of Aberdeen; United Kingdom)

arXiv:2604.10151·cs.CL·April 15, 2026

Nationality encoding in language model hidden states: Probing culturally differentiated representations in persona-conditioned academic text

Paul Jackson (1), Ruizhe Li (2), and Elspeth Edelstein (3) ((1) Language Centre, School of Language, Literature, Music, Visual Culture, University of Aberdeen, United Kingdom, (2) School of Natural, Computing Sciences, University of Aberdeen, United Kingdom

PDF

TL;DR

This study investigates whether large language models encode nationality-specific information in their hidden states when generating academic texts conditioned on British or Chinese personas, revealing nuanced encoding patterns.

Contribution

It demonstrates that language model hidden states contain nationality-discriminative information, with specific structural and lexical patterns associated with different cultural backgrounds.

Findings

01

High accuracy in nationality classification from hidden states at Layer 18

02

Distinct structural and lexical patterns for British and Chinese personas

03

No significant nationality differences in the final generated surface text

Abstract

Large language models are increasingly used as writing tools and pedagogical resources in English for Academic Purposes, but it remains unclear whether they encode culturally differentiated representations when generating academic text. This study tests whether Gemma-3-4b-it encodes nationality-discriminative information in hidden states when generating research article introductions conditioned by British and Chinese academic personas. A corpus of 270 texts was generated from 45 prompt templates crossed with six persona conditions in a 2 x 3 design. Logistic regression probes were trained on hidden-state activations across all 35 layers, with shuffled-label baselines, a surface-text skyline classifier, cross-family tests, and sentence-level baselines used as controls. Probe-selected token positions were annotated for structural, lexical, and stance features using the Stanza NLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.