Something Just Like TRuST : Toxicity Recognition of Span and Target

Berk Atil; Namrata Sureddy; Rebecca J. Passonneau

arXiv:2506.02326·cs.CL·January 7, 2026

Something Just Like TRuST : Toxicity Recognition of Span and Target

Berk Atil, Namrata Sureddy, Rebecca J. Passonneau

PDF

Open Access

TL;DR

TRuST is a large, high-quality dataset that unifies toxicity definitions and benchmarks state-of-the-art models on toxicity detection and analysis tasks, aiding safer language technology development.

Contribution

The paper introduces TRuST, a comprehensive toxicity dataset with a rigorous annotation process and benchmarks for evaluating LLMs' toxicity detection capabilities.

Findings

01

Fine-tuned PLMs outperform LLMs in toxicity tasks.

02

Current reasoning models do not significantly improve performance.

03

TRuST provides a valuable resource for toxicity evaluation and mitigation.

Abstract

Toxic language includes content that is offensive, abusive, or that promotes harm. Progress in preventing toxic output from large language models (LLMs) is hampered by inconsistent definitions of toxicity. We introduce TRuST, a large-scale dataset that unifies and expands prior resources through a carefully synthesized definition of toxicity, and corresponding annotation scheme. It consists of ~300k annotations, with high-quality human annotation on ~11k. To ensure high-quality, we designed a rigorous, multi-stage human annotation process, and evaluated the diversity of the annotators. Then we benchmarked state-of-the-art LLMs and pre-trained models on three tasks: toxicity detection, identification of the target group, and of toxic words. Our results indicate that fine-tuned PLMs outperform LLMs on the three tasks, and that current reasoning models do not reliably improve performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning