SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media
Aiqi Jiang, Arkaitz Zubiaga

TL;DR
This paper introduces SexWEs, a cross-lingual semantic specialisation method that enhances Chinese word embeddings using English sexism-related knowledge, improving sexism detection in social media for low-resource languages.
Contribution
It presents a novel cross-lingual semantic specialisation framework that leverages high-resource language resources to improve low-resource language embeddings for sexism detection.
Findings
SexWEs outperform baseline Chinese word vectors in intrinsic similarity tasks.
SexWEs improve sexism detection accuracy in social media.
The framework effectively retrofits word vectors in low-resource languages.
Abstract
The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Gender Studies in Language
