Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments
Manikandan Ravikiran, Bharathi Raja Chakravarthi, Anand Kumar, Madasamy, Sangeetha Sivanesan, Ratnavel Rajalakshmi, Sajeetha Thavareesan,, Rahul Ponnusamy, Shankar Mahadevan

TL;DR
This paper presents a new dataset and system results for identifying offensive spans in Tamil-English code-mixed social media comments, addressing a gap in fine-grained offensive content moderation for under-resourced languages.
Contribution
It introduces an annotated dataset for offensive span detection in code-mixed comments and evaluates multiple systems on this task, advancing fine-grained moderation tools.
Findings
Systems achieved varying accuracy in identifying offensive spans.
The dataset enables future research in offensive content detection.
Baseline models show room for improvement in span identification.
Abstract
Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in codemixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Text Readability and Simplification
