Extroversion or Introversion? Controlling The Personality of Your Large   Language Models

Yanquan Chen; Zhen Wu; Junjie Guo; Shujian Huang; Xinyu Dai

arXiv:2406.04583·cs.CL·June 10, 2024

Extroversion or Introversion? Controlling The Personality of Your Large Language Models

Yanquan Chen, Zhen Wu, Junjie Guo, Shujian Huang, Xinyu Dai

PDF

Open Access 1 Repo

TL;DR

This paper investigates methods to control the personalities of large language models, proposing a novel prompt induction approach that outperforms existing techniques in efficacy and robustness.

Contribution

It introduces PISF, a combined method of supervised fine-tuning and prompt induction, demonstrating superior control and stability of LLM personalities.

Findings

01

Prompt induction is most effective but less robust.

02

Supervised fine-tuning offers higher control success than RLHF.

03

PISF combines strengths of both methods for optimal control.

Abstract

Large language models (LLMs) exhibit robust capabilities in text generation and comprehension, mimicking human behavior and exhibiting synthetic personalities. However, some LLMs have displayed offensive personality, propagating toxic discourse. Existing literature neglects the origin and evolution of LLM personalities, as well as the effective personality control. To fill these gaps, our study embarked on a comprehensive investigation into LLM personality control. We investigated several typical methods to influence LLMs, including three training methods: Continual Pre-training, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF), along with inference phase considerations (prompts). Our investigation revealed a hierarchy of effectiveness in control: Prompt > SFT > RLHF > Continual Pre-train. Notably, SFT exhibits a higher control success rate compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DespairL/Personality
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsShrink and Fine-Tune