Towards Procedural Fairness: Uncovering Biases in How a Toxic Language   Classifier Uses Sentiment Information

Isar Nejadgholi; Esma Balk{\i}r; Kathleen C. Fraser; and Svetlana; Kiritchenko

arXiv:2210.10689·cs.CL·October 20, 2022

Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information

Isar Nejadgholi, Esma Balk{\i}r, Kathleen C. Fraser, and Svetlana, Kiritchenko

PDF

Open Access 1 Repo

TL;DR

This paper investigates how a toxic language classifier uses sentiment and identity terms, revealing biases and guiding future debiasing efforts to improve fairness in toxic language detection.

Contribution

It introduces a concept-based explanation framework to analyze the interaction between sentiment and identity features in toxic language classifiers, highlighting biases.

Findings

01

Sentiment information is sometimes overshadowed by identity term influence.

02

The classifier's sensitivity to sentiment varies across classes.

03

Results inform debiasing strategies for fairer toxic language models.

Abstract

Previous works on the fairness of toxic language classifiers compare the output of models with different identity terms as input features but do not consider the impact of other important concepts present in the context. Here, besides identity terms, we take into account high-level latent features learned by the classifier and investigate the interaction between these features and identity terms. For a multi-class toxic language classifier, we leverage a concept-based explanation framework to calculate the sensitivity of the model to the concept of sentiment, which has been used before as a salient feature for toxic language detection. Our results show that although for some classes, the classifier has learned the sentiment information as expected, this information is outweighed by the influence of identity terms as input features. This work is a step towards evaluating procedural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isarnejad/procedural-fairness-sentiment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Explainable Artificial Intelligence (XAI)