Realistic Evaluation of Toxicity in Large Language Models

Tinh Son Luong; Thanh-Thien Le; Linh Ngo Van; Thien Huu Nguyen

arXiv:2405.10659·cs.CL·May 21, 2024

Realistic Evaluation of Toxicity in Large Language Models

Tinh Son Luong, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the TET dataset to rigorously evaluate toxicity in large language models, revealing hidden biases and limitations of current safety measures through carefully crafted prompts.

Contribution

The paper presents the TET dataset, a new benchmark for testing toxicity in LLMs, exposing vulnerabilities in existing safety mechanisms.

Findings

01

TET uncovers toxicity issues hidden by standard prompts.

02

Current safety layers can be bypassed with minimal prompt engineering.

03

Evaluation with TET reveals subtler toxicity problems in popular LLMs.

Abstract

Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

convoicon/Thoroughly_Engineered_Toxicity
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science