U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot   Scenario

Jiaxin Song; Xinyu Wang; Yihao Wang; Yifan Tang; Ru Zhang; Jianyi Liu,; Gongshen Liu

arXiv:2501.00907·cs.SD·January 3, 2025

U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario

Jiaxin Song, Xinyu Wang, Yihao Wang, Yifan Tang, Ru Zhang, Jianyi Liu,, Gongshen Liu

PDF

Open Access

TL;DR

U-GIFT introduces an uncertainty-guided approach using Bayesian Neural Networks and self-training to improve toxic speech detection in few-shot scenarios, reducing reliance on large labeled datasets and enhancing robustness across domains.

Contribution

The paper presents a novel uncertainty-guided firewall method, U-GIFT, that leverages active learning and Bayesian Neural Networks for effective few-shot toxic speech detection.

Findings

01

U-GIFT outperforms baselines by 14.92% in 5-shot settings.

02

It is adaptable to various pre-trained language models.

03

Demonstrates robustness in imbalanced and cross-domain scenarios.

Abstract

With the widespread use of social media, user-generated content has surged on online platforms. When such content includes hateful, abusive, offensive, or cyberbullying behavior, it is classified as toxic speech, posing a significant threat to the online ecosystem's integrity and safety. While manual content moderation is still prevalent, the overwhelming volume of content and the psychological strain on human moderators underscore the need for automated toxic speech detection. Previously proposed detection methods often rely on large annotated datasets; however, acquiring such datasets is both costly and challenging in practice. To address this issue, we propose an uncertainty-guided firewall for toxic speech in few-shot scenarios, U-GIFT, that utilizes self-training to enhance detection performance even when labeled data is limited. Specifically, U-GIFT combines active learning with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection