Argument-Based Consistency in Toxicity Explanations of LLMs

Ramaravind Kommiya Mothilal; Joanna Roy; Syed Ishtiaque Ahmed; Shion Guha

arXiv:2506.19113·cs.CL·January 27, 2026

Argument-Based Consistency in Toxicity Explanations of LLMs

Ramaravind Kommiya Mothilal, Joanna Roy, Syed Ishtiaque Ahmed, Shion Guha

PDF

Open Access 1 Video

TL;DR

This paper introduces Argument-based Consistency (ArC), a new evaluation framework for assessing the logical coherence of LLMs' toxicity explanations, revealing their reasoning limitations on complex prompts.

Contribution

It proposes a theoretically-grounded, multi-dimensional criterion (ArC) and six metrics to evaluate LLMs' toxicity explanation consistency, addressing limitations of existing methods.

Findings

01

LLMs generate plausible explanations for simple prompts.

02

Reasoning about toxicity becomes inconsistent with nuanced prompts.

03

Code and explanations are open-sourced for future research.

Abstract

The discourse around toxicity and LLMs in NLP largely revolves around detection tasks. This work shifts the focus to evaluating LLMs' reasoning about toxicity - from their explanations that justify a stance - to enhance their trustworthiness in downstream tasks. Despite extensive research on explainability, it is not straightforward to adopt existing methods to evaluate free-form toxicity explanation due to their over-reliance on input text perturbations, among other challenges. To account for these, we propose a novel, theoretically-grounded multi-dimensional criterion, Argument-based Consistency (ArC), that measures the extent to which LLMs' free-form toxicity explanations reflect an ideal and logical argumentation process. Based on uncertainty quantification, we develop six metrics for ArC to comprehensively evaluate the (in)consistencies in LLMs' toxicity explanations. We conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Argument-Based Consistency in Toxicity Explanations of LLMs· underline

Taxonomy

TopicsBiomedical Ethics and Regulation · Ethics in Clinical Research · Ethics in medical practice