Neuron-based Personality Trait Induction in Large Language Models
Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao,, Ji-Rong Wen

TL;DR
This paper introduces a neuron-based method for inducing specific personality traits in large language models, using a new dataset and neuron manipulation techniques to achieve trait control without retraining.
Contribution
It presents a novel neuron identification and manipulation approach for personality trait induction in LLMs, grounded in a large-scale psychological dataset.
Findings
Effective neuron-based trait induction comparable to fine-tuning
Efficient method requires no model retraining
Resources and dataset publicly available
Abstract
Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a neuron-based approach for personality trait induction in LLMs, with three major technical contributions. First, we construct PersonalityBench, a large-scale dataset for identifying and evaluating personality traits in LLMs. This dataset is grounded in the Big Five personality traits from psychology and is designed to assess the generative capabilities of LLMs towards specific personality traits. Second, by leveraging PersonalityBench, we propose an efficient method for identifying personality-related neurons within LLMs by examining the opposite aspects of a given trait. Third, we develop a simple yet effective induction method that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Mental Health via Writing
