TL;DR
This paper presents Cisco's system for detecting toxic spans in online comments using transformer-based models, achieving competitive results in the SemEval-2021 Task 5 with a sequence tagging approach.
Contribution
It introduces a novel application of transformers for span-level toxicity detection and compares sequence tagging and dependency parsing methods for this task.
Findings
Sequence tagging approach achieved an F1 score of 0.6922.
The sequence tagging method outperformed the dependency parsing approach.
Cisco's system ranked 7th in the shared task leaderboard.
Abstract
Social network platforms are generally used to share positive, constructive, and insightful content. However, in recent times, people often get exposed to objectionable content like threat, identity attacks, hate speech, insults, obscene texts, offensive remarks or bullying. Existing work on toxic speech detection focuses on binary classification or on differentiating toxic speech among a small set of categories. This paper describes the system proposed by team Cisco for SemEval-2021 Task 5: Toxic Spans Detection, the first shared task focusing on detecting the spans in the text that attribute to its toxicity, in English language. We approach this problem primarily in two ways: a sequence tagging approach and a dependency parsing approach. In our sequence tagging approach we tag each token in a sentence under a particular tagging scheme. Our best performing architecture in this approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
