Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser   with Prompts

Shaina Raza; Chen Ding; Deval Pandya

arXiv:2307.10213·cs.CL·July 21, 2023·2 cites

Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser with Prompts

Shaina Raza, Chen Ding, Deval Pandya

PDF

Open Access

TL;DR

This paper introduces a two-step approach combining hate speech detection and prompt-based debiasing to reduce biases in online conversations, aiming to foster more inclusive communication environments.

Contribution

It presents a novel method that integrates hate speech classification with prompt-driven debiasing to mitigate biases in conversational data.

Findings

01

Reduced negativity in hate speech comments

02

Effective generation of less biased alternatives

03

Improved fairness in online discourse

Abstract

Discriminatory language and biases are often present in hate speech during conversations, which usually lead to negative impacts on targeted groups such as those based on race, gender, and religion. To tackle this issue, we propose an approach that involves a two-step process: first, detecting hate speech using a classifier, and then utilizing a debiasing component that generates less biased or unbiased alternatives through prompts. We evaluated our approach on a benchmark dataset and observed reduction in negativity due to hate speech comments. The proposed method contributes to the ongoing efforts to reduce biases in online discourse and promote a more inclusive and fair environment for communication.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection