Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Rob van der Goot; Nikola Ljube\v{s}i\'c; Ian Matroos; Malvina Nissim,; Barbara Plank

arXiv:1805.03122·cs.CL·May 9, 2018·23 cites

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction

Rob van der Goot, Nikola Ljube\v{s}i\'c, Ian Matroos, Malvina Nissim,, Barbara Plank

PDF

Open Access 1 Repo

TL;DR

This paper introduces 'bleaching text', an abstract feature transformation that improves cross-lingual gender prediction, demonstrating comparable human and model performance and surpassing lexical approaches.

Contribution

It proposes a novel text transformation method that enhances transferability in cross-lingual gender prediction and provides the first comparison with human performance.

Findings

01

Bleached features outperform lexical models in cross-lingual transfer.

02

Humans perform similarly to bleached models in gender prediction.

03

Bleached features enable better cross-lingual transfer than embeddings.

Abstract

Gender prediction has typically focused on lexical and social network features, yielding good performance, but making systems highly language-, topic-, and platform-dependent. Cross-lingual embeddings circumvent some of these limitations, but capture gender-specific style less. We propose an alternative: bleaching text, i.e., transforming lexical strings into more abstract features. This study provides evidence that such features allow for better transfer across languages. Moreover, we present a first study on the ability of humans to perform cross-lingual gender prediction. We find that human predictive power proves similar to that of our bleached models, and both perform better than lexical models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bplank/bleaching-text
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling