Neuron-based Personality Trait Induction in Large Language Models

Jia Deng; Tianyi Tang; Yanbin Yin; Wenhao Yang; Wayne Xin Zhao,; Ji-Rong Wen

arXiv:2410.12327·cs.CL·October 17, 2024

Neuron-based Personality Trait Induction in Large Language Models

Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao,, Ji-Rong Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neuron-based method for inducing specific personality traits in large language models, using a new dataset and neuron manipulation techniques to achieve trait control without retraining.

Contribution

It presents a novel neuron identification and manipulation approach for personality trait induction in LLMs, grounded in a large-scale psychological dataset.

Findings

01

Effective neuron-based trait induction comparable to fine-tuning

02

Efficient method requires no model retraining

03

Resources and dataset publicly available

Abstract

Large language models (LLMs) have become increasingly proficient at simulating various personality traits, an important capability for supporting related applications (e.g., role-playing). To further improve this capacity, in this paper, we present a neuron-based approach for personality trait induction in LLMs, with three major technical contributions. First, we construct PersonalityBench, a large-scale dataset for identifying and evaluating personality traits in LLMs. This dataset is grounded in the Big Five personality traits from psychology and is designed to assess the generative capabilities of LLMs towards specific personality traits. Second, by leveraging PersonalityBench, we propose an efficient method for identifying personality-related neurons within LLMs by examining the opposite aspects of a given trait. Third, we develop a simple yet effective induction method that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RUCAIBox/NPTI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Mental Health via Writing