Stylistic Variation in Social Media Part-of-Speech Tagging
Murali Raghu Babu Balusu, Taha Merghani, Jacob Eisenstein

TL;DR
This study investigates how social network attributes influence stylistic variation in social media, affecting part-of-speech tagging accuracy and highlighting the importance of diverse training data and model robustness.
Contribution
It provides new evidence linking social network structure to POS tagging errors and explores a mixture-of-experts model to address stylistic variation.
Findings
Tagger error rates correlate with social network structure.
Balanced training data improves POS tagging accuracy.
Mixture-of-experts model did not improve performance.
Abstract
Social media features substantial stylistic variation, raising new challenges for syntactic analysis of online writing. However, this variation is often aligned with author attributes such as age, gender, and geography, as well as more readily-available social network metadata. In this paper, we report new evidence on the link between language and social networks in the task of part-of-speech tagging. We find that tagger error rates are correlated with network structure, with high accuracy in some parts of the network, and lower accuracy elsewhere. As a result, tagger accuracy depends on training from a balanced sample of the network, rather than training on texts from a narrow subcommunity. We also describe our attempts to add robustness to stylistic variation, by building a mixture-of-experts model in which each expert is associated with a region of the social network. While prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
