A Study of Slang Representation Methods
Aravinda Kolla, Filip Ilievski, H\^ong-\^An Sandlin, Alain Mermoud

TL;DR
This paper evaluates various slang representation methods, highlighting the effectiveness of social media pre-training and analyzing challenges like out-of-vocabulary words and slang's evolving nature for social good applications.
Contribution
It introduces a framework for studying slang representation methods and provides empirical insights into their performance and challenges in downstream social good tasks.
Findings
Pre-trained social media models outperform others.
Dictionaries improve static embeddings but not contextual models.
Key challenges include out-of-vocabulary words and slang variability.
Abstract
Considering the large amount of content created online by the minute, slang-aware automatic tools are critically needed to promote social good, and assist policymakers and moderators in restricting the spread of offensive language, abuse, and hate speech. Despite the success of large language models and the spontaneous emergence of slang dictionaries, it is unclear how far their combination goes in terms of slang understanding for downstream social good tasks. In this paper, we provide a framework to study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding. Our experiments show the superiority of models that have been pre-trained on social media data, while the impact of dictionaries is positive only for static word embeddings. Our error analysis identifies core challenges for slang…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism · Natural Language Processing Techniques
