UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction
Erik Yan, Harish Tayyar Madabushi

TL;DR
This paper enhances pre-trained language models with task and domain-specific information and incorporates conditional random fields to improve toxic span detection in social media, achieving near-top performance.
Contribution
It introduces modifications to pre-trained models by including task/domain info and using CRFs, which improves toxic span detection results.
Findings
Achieved a score within 4 percentage points of the top team.
Enhanced model performance by incorporating task-specific information.
Demonstrated the effectiveness of CRFs in token classification for toxicity detection.
Abstract
Toxicity is pervasive in social media and poses a major threat to the health of online communities. The recent introduction of pre-trained language models, which have achieved state-of-the-art results in many NLP tasks, has transformed the way in which we approach natural language processing. However, the inherent nature of pre-training means that they are unlikely to capture task-specific statistical information or learn domain-specific knowledge. Additionally, most implementations of these models typically do not employ conditional random fields, a method for simultaneous token classification. We show that these modifications can improve model performance on the Toxic Spans Detection task at SemEval-2021 to achieve a score within 4 percentage points of the top performing team.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Software Engineering Research
