Evaluating Psychological Safety of Large Language Models

Xingxuan Li; Yutong Li; Lin Qiu; Shafiq Joty; Lidong Bing

arXiv:2212.10529·cs.CL·March 1, 2024·25 cites

Evaluating Psychological Safety of Large Language Models

Xingxuan Li, Yutong Li, Lin Qiu, Shafiq Joty, Lidong Bing

PDF

Open Access 1 Video

TL;DR

This paper systematically evaluates the psychological safety of large language models using personality and well-being tests, revealing dark personality traits and the effects of fine-tuning on model safety.

Contribution

It introduces unbiased prompts and systematic evaluation methods to assess psychological safety, highlighting the persistence of dark traits and improvements through fine-tuning.

Findings

01

LLMs scored higher than humans on dark personality traits.

02

Fine-tuning with BFI responses reduces psychological toxicity.

03

Well-being scores increase with more training data.

Abstract

In this work, we designed unbiased prompts to systematically evaluate the psychological safety of large language models (LLMs). First, we tested five different LLMs by using two personality tests: Short Dark Triad (SD-3) and Big Five Inventory (BFI). All models scored higher than the human average on SD-3, suggesting a relatively darker personality pattern. Despite being instruction fine-tuned with safety metrics to reduce toxicity, InstructGPT, GPT-3.5, and GPT-4 still showed dark personality patterns; these models scored higher than self-supervised GPT-3 on the Machiavellianism and narcissism traits on SD-3. Then, we evaluated the LLMs in the GPT series by using well-being tests to study the impact of fine-tuning with more training data. We observed a continuous increase in the well-being scores of GPT models. Following these observations, we showed that fine-tuning Llama-2-chat-7B…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Evaluating Psychological Safety of Large Language Models· underline

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Computational and Text Analysis Methods

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Flan-T5 · Weight Decay · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Dropout · Linear Layer