Leveraging Large Language Models and Topic Modeling for Toxicity   Classification

Haniyeh Ehsani Oskouie; Christina Chance; Claire Huang; Margaret; Capetz; Elizabeth Eyeson; Majid Sarrafzadeh

arXiv:2411.17876·cs.CL·November 28, 2024

Leveraging Large Language Models and Topic Modeling for Toxicity Classification

Haniyeh Ehsani Oskouie, Christina Chance, Claire Huang, Margaret, Capetz, Elizabeth Eyeson, Majid Sarrafzadeh

PDF

Open Access 1 Repo

TL;DR

This paper investigates how fine-tuning large language models with topic modeling can improve toxicity classification, revealing limitations of current models and emphasizing the influence of annotator bias on model performance.

Contribution

It introduces a combined approach of fine-tuning BERTweet and HateBERT with topic modeling to enhance toxicity detection accuracy.

Findings

01

Fine-tuning models on specific topics improves F1 scores.

02

State-of-the-art models still struggle with toxicity detection accuracy.

03

Annotator bias impacts model training and outcomes.

Abstract

Content moderation and toxicity classification represent critical tasks with significant social implications. However, studies have shown that major classification models exhibit tendencies to magnify or reduce biases and potentially overlook or disadvantage certain marginalized groups within their classification processes. Researchers suggest that the positionality of annotators influences the gold standard labels in which the models learned from propagate annotators' bias. To further investigate the impact of annotator positionality, we delve into fine-tuning BERTweet and HateBERT on the dataset while using topic-modeling strategies for content moderation. The results indicate that fine-tuning the models on specific topics results in a notable improvement in the F1 score of the models when compared to the predictions generated by other prominent classification models such as GPT-4,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aheldis/toxicity-classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Biomedical Text Mining and Ontologies

MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax