The Better Angels of Machine Personality: How Personality Relates to LLM   Safety

Jie Zhang; Dongrui Liu; Chen Qian; Ziyue Gan; Yong Liu; Yu Qiao; Jing; Shao

arXiv:2407.12344·cs.CL·July 18, 2024·1 cites

The Better Angels of Machine Personality: How Personality Relates to LLM Safety

Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing, Shao

PDF

Open Access 1 Repo

TL;DR

This study explores how personality traits in Large Language Models influence their safety behaviors, revealing that editing these traits can significantly improve safety performance and reduce jailbreak susceptibility.

Contribution

It is the first to analyze the relationship between LLM personality traits and safety abilities, demonstrating how personality editing enhances safety and resilience.

Findings

01

LLMs' safety abilities are closely related to their personality traits.

02

Editing LLMs' personality traits can improve safety performance by up to 43%.

03

Different personality traits affect susceptibility to jailbreak attacks.

Abstract

Personality psychologists have analyzed the relationship between personality and safety behaviors in human society. Although Large Language Models (LLMs) demonstrate personality traits, the relationship between personality traits and safety abilities in LLMs still remains a mystery. In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i.e., toxicity, privacy, and fairness, based on the reliable MBTI-M scale. Meanwhile, the safety alignment generally increases various LLMs' Extraversion, Sensing, and Judging traits. According to such findings, we can edit LLMs' personality traits and improve their safety performance, e.g., inducing personality from ISTJ to ISTP resulted in a relative improvement of approximately 43% and 10% in privacy and fairness performance, respectively. Additionally, we find that LLMs with different personality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tmylla/Persafety
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property · Artificial Intelligence in Law · Digital Rights Management and Security