Using Word Embeddings in Twitter Election Classification
Xiao Yang, Craig Macdonald, Iadh Ounis

TL;DR
This study examines how different configurations of word embeddings, such as training data, window size, and dimensionality, affect Twitter election classification performance, demonstrating that aligned data and larger parameters improve results.
Contribution
It provides a systematic analysis of how training data type, window size, and embedding dimensions influence Twitter election classification accuracy.
Findings
Aligned background data improves classification performance.
Larger context window and embedding dimensions yield better results.
Word embeddings with CNN outperform traditional baselines.
Abstract
Word embeddings and convolutional neural networks (CNN) have attracted extensive attention in various classification tasks for Twitter, e.g. sentiment classification. However, the effect of the configuration used to train and generate the word embeddings on the classification performance has not been studied in the existing literature. In this paper, using a Twitter election classification task that aims to detect election-related tweets, we investigate the impact of the background dataset used to train the embedding models, the context window size and the dimensionality of word embeddings on the classification performance. By comparing the classification results of two word embedding models, which are trained using different background corpora (e.g. Wikipedia articles and Twitter microposts), we show that the background data type should align with the Twitter classification dataset to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Internet Traffic Analysis and Secure E-voting
MethodsSupport Vector Machine
