U-GIFT: Uncertainty-Guided Firewall for Toxic Speech in Few-Shot Scenario
Jiaxin Song, Xinyu Wang, Yihao Wang, Yifan Tang, Ru Zhang, Jianyi Liu,, Gongshen Liu

TL;DR
U-GIFT introduces an uncertainty-guided approach using Bayesian Neural Networks and self-training to improve toxic speech detection in few-shot scenarios, reducing reliance on large labeled datasets and enhancing robustness across domains.
Contribution
The paper presents a novel uncertainty-guided firewall method, U-GIFT, that leverages active learning and Bayesian Neural Networks for effective few-shot toxic speech detection.
Findings
U-GIFT outperforms baselines by 14.92% in 5-shot settings.
It is adaptable to various pre-trained language models.
Demonstrates robustness in imbalanced and cross-domain scenarios.
Abstract
With the widespread use of social media, user-generated content has surged on online platforms. When such content includes hateful, abusive, offensive, or cyberbullying behavior, it is classified as toxic speech, posing a significant threat to the online ecosystem's integrity and safety. While manual content moderation is still prevalent, the overwhelming volume of content and the psychological strain on human moderators underscore the need for automated toxic speech detection. Previously proposed detection methods often rely on large annotated datasets; however, acquiring such datasets is both costly and challenging in practice. To address this issue, we propose an uncertainty-guided firewall for toxic speech in few-shot scenarios, U-GIFT, that utilizes self-training to enhance detection performance even when labeled data is limited. Specifically, U-GIFT combines active learning with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection
