Exploring the Personality Traits of LLMs through Latent Features Steering
Shu Yang, Shenzhe Zhu, Liang Liu, Lijie Hu, Mengdi Li, Di Wang

TL;DR
This paper investigates how latent features within large language models encode personality traits influenced by cultural and environmental factors, proposing a training-free method to steer these traits for improved interpretability and safety.
Contribution
It introduces a novel, training-free approach to extract and steer latent features related to personality traits in LLMs, enhancing interpretability without retraining.
Findings
Latent features can be manipulated to alter perceived personality traits.
Personality traits in LLMs are influenced by cultural and environmental factors.
Steering latent features impacts model safety and behavior.
Abstract
Large language models (LLMs) have significantly advanced dialogue systems and role-playing agents through their ability to generate human-like text. While prior studies have shown that LLMs can exhibit distinct and consistent personalities, the mechanisms through which these models encode and express specific personality traits remain poorly understood. To address this, we investigate how various factors, such as cultural norms and environmental stressors, encoded within LLMs, shape their personality traits, guided by the theoretical framework of social determinism. Inspired by related work on LLM interpretability, we propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model, thereby eliminating the need for retraining. Furthermore, we analyze the implications of these factors for model safety,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
