UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to   Include Task and Domain-Specific Information for Toxic Span Prediction

Erik Yan; Harish Tayyar Madabushi

arXiv:2110.03730·cs.CL·October 11, 2021

UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

Erik Yan, Harish Tayyar Madabushi

PDF

Open Access 1 Repo

TL;DR

This paper enhances pre-trained language models with task and domain-specific information and incorporates conditional random fields to improve toxic span detection in social media, achieving near-top performance.

Contribution

It introduces modifications to pre-trained models by including task/domain info and using CRFs, which improves toxic span detection results.

Findings

01

Achieved a score within 4 percentage points of the top team.

02

Enhanced model performance by incorporating task-specific information.

03

Demonstrated the effectiveness of CRFs in token classification for toxicity detection.

Abstract

Toxicity is pervasive in social media and poses a major threat to the health of online communities. The recent introduction of pre-trained language models, which have achieved state-of-the-art results in many NLP tasks, has transformed the way in which we approach natural language processing. However, the inherent nature of pre-training means that they are unlikely to capture task-specific statistical information or learn domain-specific knowledge. Additionally, most implementations of these models typically do not employ conditional random fields, a method for simultaneous token classification. We show that these modifications can improve model performance on the Toxic Spans Detection task at SemEval-2021 to achieve a score within 4 percentage points of the top performing team.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

erikdyan/toxic_span_detection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Software Engineering Research