TL;DR
This paper compares various models including attention, NER, and ensemble approaches for toxic span detection, aiming to improve interpretability of toxicity models and assist human moderation.
Contribution
It provides a comprehensive analysis of multiple modeling techniques and presents an ensemble approach that achieves competitive performance in toxic span detection.
Findings
Ensemble model achieved F1 of 0.684
Attention and NER models were evaluated
Keyword-based models served as initial baselines
Abstract
Detecting which parts of a sentence contribute to that sentence's toxicity -- rather than providing a sentence-level verdict of hatefulness -- would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team's, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entity-based, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition's evaluation phase.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
