Out of vocabulary words decrease, running texts prevail and hashtags coalesce: Twitter as an evolving sociolinguistic system
Suman Kalyan Maity, Bhadreswar Ghuku, Abhishek Upmanyu, Animesh, Mukherjee

TL;DR
This study analyzes Twitter's linguistic evolution over time, revealing a decrease in out-of-vocabulary words, a rise in running texts, and the coalescence of hashtags, reflecting complex sociolinguistic changes.
Contribution
It provides the first comprehensive quantitative analysis of Twitter's sociolinguistic evolution, focusing on word usage, formality, and hashtag dynamics over large time scales.
Findings
Out-of-vocabulary words are decreasing over time.
Whitespace usage is reducing, leading to more running texts.
Hashtags tend to coalesce and repeat, indicating linguistic evolution.
Abstract
Twitter is one of the most popular social media. Due to the ease of availability of data, Twitter is used significantly for research purposes. Twitter is known to evolve in many aspects from what it was at its birth; nevertheless, how it evolved its own linguistic style is still relatively unknown. In this paper, we study the evolution of various sociolinguistic aspects of Twitter over large time scales. To the best of our knowledge, this is the first comprehensive study on the evolution of such aspects of this OSN. We performed quantitative analysis both on the word level as well as on the hashtags since it is perhaps one of the most important linguistic units of this social media. We studied the (in)formality aspects of the linguistic styles in Twitter and find that it is neither fully formal nor completely informal; while on one hand, we observe that Out-Of-Vocabulary words are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Authorship Attribution and Profiling · Language and cultural evolution
