Studying Socially Unacceptable Discourse Classification (SUD) through   different eyes: "Are we on the same page ?"

Bruno Machado Carneiro; Michele Linardi; Julien Longhi

arXiv:2308.04180·cs.CL·August 9, 2023

Studying Socially Unacceptable Discourse Classification (SUD) through different eyes: "Are we on the same page ?"

Bruno Machado Carneiro, Michele Linardi, Julien Longhi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new annotated corpus for Socially Unacceptable Discourse detection, evaluates classifier generalization across contexts, and discusses challenges and insights for improving SUD classification.

Contribution

It presents a novel, diverse corpus for SUD detection and analyzes how annotation differences affect classifier performance across online sources.

Findings

01

The corpus covers various online sources and SUD categories.

02

Classifier generalization varies across different annotation modalities.

03

Data insights support improved annotation and detection strategies.

Abstract

We study Socially Unacceptable Discourse (SUD) characterization and detection in online text. We first build and present a novel corpus that contains a large variety of manually annotated texts from different online sources used so far in state-of-the-art Machine learning (ML) SUD detection solutions. This global context allows us to test the generalization ability of SUD classifiers that acquire knowledge around the same SUD categories, but from different contexts. From this perspective, we can analyze how (possibly) different annotation modalities influence SUD learning by discussing open challenges and open research directions. We also provide several data insights which can support domain experts in the annotation task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlinardicyu/sud_study_different_eyes
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Natural Language Processing Techniques · Topic Modeling