SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic   Specialisation for Chinese Sexism Detection in Social Media

Aiqi Jiang; Arkaitz Zubiaga

arXiv:2211.08447·cs.CL·April 3, 2023

SexWEs: Domain-Aware Word Embeddings via Cross-lingual Semantic Specialisation for Chinese Sexism Detection in Social Media

Aiqi Jiang, Arkaitz Zubiaga

PDF

Open Access 1 Repo

TL;DR

This paper introduces SexWEs, a cross-lingual semantic specialisation method that enhances Chinese word embeddings using English sexism-related knowledge, improving sexism detection in social media for low-resource languages.

Contribution

It presents a novel cross-lingual semantic specialisation framework that leverages high-resource language resources to improve low-resource language embeddings for sexism detection.

Findings

01

SexWEs outperform baseline Chinese word vectors in intrinsic similarity tasks.

02

SexWEs improve sexism detection accuracy in social media.

03

The framework effectively retrofits word vectors in low-resource languages.

Abstract

The goal of sexism detection is to mitigate negative online content targeting certain gender groups of people. However, the limited availability of labeled sexism-related datasets makes it problematic to identify online sexism for low-resource languages. In this paper, we address the task of automatic sexism detection in social media for one low-resource language -- Chinese. Rather than collecting new sexism data or building cross-lingual transfer learning models, we develop a cross-lingual domain-aware semantic specialisation system in order to make the most of existing data. Semantic specialisation is a technique for retrofitting pre-trained distributional word vectors by integrating external linguistic knowledge (such as lexico-semantic relations) into the specialised feature space. To do this, we leverage semantic resources for sexism from a high-resource language (English) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aggiejiang/sexwes
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Gender Studies in Language