SATLab at SemEval-2022 Task 4: Trying to Detect Patronizing and   Condescending Language with only Character and Word N-grams

Yves Bestgen

arXiv:2203.05355·cs.CL·March 11, 2022

SATLab at SemEval-2022 Task 4: Trying to Detect Patronizing and Condescending Language with only Character and Word N-grams

Yves Bestgen

PDF

Open Access

TL;DR

This paper presents a simple logistic regression approach using character and word n-grams for detecting patronizing and condescending language, achieving moderate success and highlighting the task's difficulty.

Contribution

It demonstrates that a basic n-gram based logistic regression model can serve as a baseline for PCL detection, confirming the task's complexity.

Findings

01

Model outperforms no-knowledge baseline

02

Performance is below top systems

03

Highlights difficulty of PCL detection

Abstract

A logistic regression model only fed with character and word n-grams is proposed for the SemEval-2022 Task 4 on Patronizing and Condescending Language Detection (PCL). It obtained an average level of performance, well above the performance of a system that tries to guess without using any knowledge about the task, but much lower than the best teams. As the proposed model is very similar to the one that performed well on a task requiring to automatically identify hate speech and offensive content, this paper confirms the difficulty of PCL detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism

MethodsLogistic Regression