RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Adrian de Wynter; Ishaan Watts; Tua Wongsangaroonsri; Minghui Zhang,; Noura Farra; Nektar Ege Alt{\i}ntoprak; Lena Baur; Samantha Claudet; Pavel; Gajdusek; Can G\"oren; Qilong Gu; Anna Kaminska; Tomasz Kaminski; Ruby Kuo,; Akiko Kyuba; Jongho Lee; Kartik Mathur; Petter Merok; Ivana Milovanovi\'c,; Nani Paananen; Vesa-Matti Paananen; Anna Pavlenko; Bruno Pereira Vidal,; Luciano Strika; Yueh Tsao; Davide Turcato; Oleksandr Vakhno; Judit Velcsov,; Anna Vickers; St\'ephanie Visser; Herdyan Widarmanto; Andrey Zaikin; Si-Qing; Chen

arXiv:2404.14397·cs.CL·May 5, 2025

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Adrian de Wynter, Ishaan Watts, Tua Wongsangaroonsri, Minghui Zhang,, Noura Farra, Nektar Ege Alt{\i}ntoprak, Lena Baur, Samantha Claudet, Pavel, Gajdusek, Can G\"oren, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo,, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok

PDF

Open Access 1 Repo 6 Models 1 Datasets 1 Video

TL;DR

This paper introduces RTP-LX, a multilingual, human-annotated dataset for evaluating LLMs' ability to detect toxicity across 28 languages, highlighting current models' limitations in cultural sensitivity and context understanding.

Contribution

The paper presents RTP-LX, a novel multilingual toxicity evaluation dataset with cultural considerations, and assesses LLMs' performance, revealing gaps in holistic and context-aware toxicity detection.

Findings

01

Models have acceptable accuracy but low agreement with human judgments.

02

Difficulty in detecting subtle, context-dependent harm such as microaggressions.

03

Dataset aims to improve multilingual safety evaluations of LLMs.

Abstract

Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/rtp-lx
pytorchOfficial

Models

Datasets

adewynter/RTP-LX
dataset· 5 dl
5 dl

Videos

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?· underline

Taxonomy

TopicsNatural Language Processing Techniques