Take its Essence, Discard its Dross! Debiasing for Toxic Language   Detection via Counterfactual Causal Effect

Junyu Lu; Bo Xu; Xiaokun Zhang; Kaiyuan Liu; Dongyu Zhang; Liang Yang,; Hongfei Lin

arXiv:2406.00983·cs.CL·June 4, 2024

Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

Junyu Lu, Bo Xu, Xiaokun Zhang, Kaiyuan Liu, Dongyu Zhang, Liang Yang,, Hongfei Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel causal inference framework to effectively distinguish and remove misleading lexical biases in toxic language detection, improving accuracy and fairness.

Contribution

It proposes a Counterfactual Causal Debiasing Framework that preserves useful lexical effects while eliminating misleading biases, enhancing model performance.

Findings

01

Achieves state-of-the-art accuracy and fairness in toxic language detection.

02

Outperforms existing debiasing methods on out-of-distribution data.

03

Demonstrates effective separation of useful and misleading lexical biases.

Abstract

Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them indiscriminately, resulting in a degradation in the detection accuracy of the model. To this end, we propose a Counterfactual Causal Debiasing Framework (CCDF) to mitigate lexical bias in TLD. It preserves the "useful impact" of lexical bias and eliminates the "misleading impact". Specifically, we first represent the total effect of the original sentence and biased tokens on decisions from a causal view. We then conduct counterfactual inference to exclude the direct causal effect of lexical bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DUT-lujunyu/Debias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Deception detection and forensic psychology