Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?
Prateek Rajput, Yewei Song, Iyiola E. Olatunji, Jacques Klein, Tegawend\'e F. Bissyand\'e

TL;DR
This paper investigates whether large language models can reliably express stable, human-like personalities through fine-tuning, revealing limitations in their ability to faithfully embody complex personality profiles.
Contribution
It demonstrates that fine-tuning reduces response variance but does not improve the accuracy of the full personality profile, highlighting fundamental challenges in personality induction.
Findings
Fine-tuning reduces questionnaire response variance across models.
Stability increases but profile accuracy remains near chance.
Unguided essays lack sufficient cues for faithful personality expression.
Abstract
Can large language models reliably express a human-like personality, or are they merely mimicking surface cues without a stable underlying profile? To investigate this, we induce personality in LLMs by fine-tuning them on the long-form essays, where each essay is associated with a target Big Five personality profile. We then evaluate the stability and fidelity of the induced personality using the IPIP-NEO questionnaire. Specifically, we ask: (i) does post-training (SFT, DPO, ORPO) stabilize questionnaire scores under prompt rephrasings, and (ii) can it induce target Big Five profiles from unguided essays? Our results demonstrate that fine-tuning consistently reduces variance in questionnaire responses across five models, directly mitigating the evaluation fragility reported in pre-trained models. However, this newfound stability reveals a more fundamental limitation: accuracy on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
