Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations

Xiaoxu Ma; Xiangbo Zhang; Zhenyu Weng

arXiv:2601.09833·cs.CL·January 16, 2026

Stable and Explainable Personality Trait Evaluation in Large Language Models with Internal Activations

Xiaoxu Ma, Xiangbo Zhang, Zhenyu Weng

PDF

Open Access

TL;DR

This paper introduces PVNI, a novel internal-activation-based method for stable and explainable personality trait evaluation in large language models, overcoming limitations of existing questionnaire-based approaches.

Contribution

The paper proposes PVNI, a new approach leveraging internal activations for stable, interpretable personality assessment in LLMs, with theoretical analysis and extensive experimental validation.

Findings

01

PVNI provides more stable personality evaluations than existing methods.

02

PVNI maintains robustness under different prompt phrasing and role-play scenarios.

03

Theoretical analysis supports the effectiveness and generalization of PVNI.

Abstract

Evaluating personality traits in Large Language Models (LLMs) is key to model interpretation, comparison, and responsible deployment. However, existing questionnaire-based evaluation methods exhibit limited stability and offer little explainability, as their results are highly sensitive to minor variations in prompt phrasing or role-play configurations. To address these limitations, we propose an internal-activation-based approach, termed Persona-Vector Neutrality Interpolation (PVNI), for stable and explainable personality trait evaluation in LLMs. PVNI extracts a persona vector associated with a target personality trait from the model's internal activations using contrastive prompts. It then estimates the corresponding neutral score by interpolating along the persona vector as an anchor axis, enabling an interpretable comparison between the neutral prompt representation and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersona Design and Applications · Machine Learning in Healthcare · Topic Modeling