Gender Prediction from Tweets: Improving Neural Representations with   Hand-Crafted Features

Erhan Sezerer; Ozan Polatbilek; Selma Tekir

arXiv:1908.09919·cs.CL·September 9, 2019·1 cites

Gender Prediction from Tweets: Improving Neural Representations with Hand-Crafted Features

Erhan Sezerer, Ozan Polatbilek, Selma Tekir

PDF

Open Access 1 Repo

TL;DR

This paper introduces an RNN with Attention model for gender prediction from tweets, enhanced with hand-crafted n-gram features, achieving state-of-the-art results in multiple languages.

Contribution

It combines neural attention mechanisms with traditional n-gram features to improve gender prediction accuracy from Twitter data.

Findings

01

State-of-the-art performance on English gender prediction.

02

Competitive results on Spanish and Arabic datasets.

03

Enhanced model outperforms previous approaches.

Abstract

Author profiling is the characterization of an author through some key attributes such as gender, age, and language. In this paper, a RNN model with Attention (RNNwA) is proposed to predict the gender of a twitter user using their tweets. Both word level and tweet level attentions are utilized to learn 'where to look'. This model (https://github.com/Darg-Iztech/gender-prediction-from-tweets) is improved by concatenating LSA-reduced n-gram features with the learned neural representation of a user. Both models are tested on three languages: English, Spanish, Arabic. The improved version of the proposed model (RNNwA + n-gram) achieves state-of-the-art performance on English and has competitive results on Spanish and Arabic.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Darg-Iztech/gender-prediction-from-tweets
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection