Stylistic Variation in Social Media Part-of-Speech Tagging

Murali Raghu Babu Balusu; Taha Merghani; Jacob Eisenstein

arXiv:1804.07331·cs.CL·April 23, 2018

Stylistic Variation in Social Media Part-of-Speech Tagging

Murali Raghu Babu Balusu, Taha Merghani, Jacob Eisenstein

PDF

TL;DR

This study investigates how social network attributes influence stylistic variation in social media, affecting part-of-speech tagging accuracy and highlighting the importance of diverse training data and model robustness.

Contribution

It provides new evidence linking social network structure to POS tagging errors and explores a mixture-of-experts model to address stylistic variation.

Findings

01

Tagger error rates correlate with social network structure.

02

Balanced training data improves POS tagging accuracy.

03

Mixture-of-experts model did not improve performance.

Abstract

Social media features substantial stylistic variation, raising new challenges for syntactic analysis of online writing. However, this variation is often aligned with author attributes such as age, gender, and geography, as well as more readily-available social network metadata. In this paper, we report new evidence on the link between language and social networks in the task of part-of-speech tagging. We find that tagger error rates are correlated with network structure, with high accuracy in some parts of the network, and lower accuracy elsewhere. As a result, tagger accuracy depends on training from a balanced sample of the network, rather than training on texts from a narrow subcommunity. We also describe our attempts to add robustness to stylistic variation, by building a mixture-of-experts model in which each expert is associated with a region of the social network. While prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.