Extroversion or Introversion? Controlling The Personality of Your Large Language Models
Yanquan Chen, Zhen Wu, Junjie Guo, Shujian Huang, Xinyu Dai

TL;DR
This paper investigates methods to control the personalities of large language models, proposing a novel prompt induction approach that outperforms existing techniques in efficacy and robustness.
Contribution
It introduces PISF, a combined method of supervised fine-tuning and prompt induction, demonstrating superior control and stability of LLM personalities.
Findings
Prompt induction is most effective but less robust.
Supervised fine-tuning offers higher control success than RLHF.
PISF combines strengths of both methods for optimal control.
Abstract
Large language models (LLMs) exhibit robust capabilities in text generation and comprehension, mimicking human behavior and exhibiting synthetic personalities. However, some LLMs have displayed offensive personality, propagating toxic discourse. Existing literature neglects the origin and evolution of LLM personalities, as well as the effective personality control. To fill these gaps, our study embarked on a comprehensive investigation into LLM personality control. We investigated several typical methods to influence LLMs, including three training methods: Continual Pre-training, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF), along with inference phase considerations (prompts). Our investigation revealed a hierarchy of effectiveness in control: Prompt > SFT > RLHF > Continual Pre-train. Notably, SFT exhibits a higher control success rate compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsShrink and Fine-Tune
