Editing Personality for Large Language Models

Shengyu Mao; Xiaohan Wang; Mengru Wang; Yong Jiang; Pengjun Xie; Fei; Huang; Ningyu Zhang

arXiv:2310.02168·cs.CL·September 4, 2024·2 cites

Editing Personality for Large Language Models

Shengyu Mao, Xiaohan Wang, Mengru Wang, Yong Jiang, Pengjun Xie, Fei, Huang, Ningyu Zhang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper proposes a new task and dataset for editing the personality traits of Large Language Models, enabling controlled personality expression in responses based on social psychology theory.

Contribution

It introduces the PersonalityEdit benchmark dataset and explores methods for editing LLMs' personality traits aligned with social psychology concepts.

Findings

01

Identified challenges in editing personality traits of LLMs

02

Demonstrated the feasibility of trait-specific response generation

03

Highlighted remaining issues in personality editing

Abstract

This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct PersonalityEdit, a new benchmark dataset to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that align with a specified topic and embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our findings uncover…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 3

Strengths

The research direction of personality editing presents a compelling and possibly influential field of study. The authors have developed a corresponding dataset through the data generation capabilities of GPT-4. Additionally, they offer a series of insightful experiments employing various baseline models within the task of personality editing. These models are benchmarked against the dataset, providing valuable findings that enhance our understanding of the subject.

Weaknesses

The paper's exploration of personality editing in language models is certainly an intriguing endeavor, but there are aspects that invite scrutiny regarding its novelty and significance: The primary contribution of the paper is the introduction of the PersonalityEdit dataset, benchmarking established methods in the context of this new dataset. However, the paper could benefit from a more detailed analysis of the dataset construction process, particularly since the dataset is generated by prompti

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

- The paper addresses a unique and intriguing topic - adjusting the personality of LLMs. This is a fresh perspective on the capabilities of LLMs beyond their usual tasks. - The paper employs a combination of knowledge editing and a scoring system to evaluate the alignment of LLM responses with specific personality traits. T - The authors have acknowledged the potential biases in the pre-training corpus and the possibility of eliciting offensive or discriminatory content. This shows a responsibl

Weaknesses

- Lack of significance tests - The manuscript needs reorganization since many important points are in the Appendix - Partial assessment of personality traits

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- The paper introduced a new automatically generated dataset aligned with 3 of the big five personality types, which can be used to understand how LLMs interpret the personality types. - The data generation process is simple and easy to follow. - They evaluate the collected dataset against existing baseline and highlights the challenges.

Weaknesses

- Motivation for the task is quite weak. I am not convinced that the task is novel. It is probably style transfer or conditional text generation by prompting LLMs, where personality defines the style of text. While they contrast it with style transfer, the justification on how it is different is not well supported. - While the paper is easy to follow given the simplicity of the task, it fails to give you a comprehensive overview of the properties of the dataset. Minor: - In abstract the last

Code & Models

Repositories

zjunlp/easyedit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property · Library Science and Information Systems

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization