The Readability of Tweets and their Geographic Correlation with Education
James R. A. Davenport, Robert DeLine

TL;DR
This study analyzes the readability of 17.4 million tweets using a modified Flesch score, revealing that tweet complexity correlates with regional education levels in the U.S.
Contribution
It introduces a modified readability measure for tweets and uncovers a geographic correlation between tweet complexity and local education attainment.
Findings
Tweets are generally more difficult to read than SMS or chat messages.
Readability scores are unaffected by hashtags within tweets.
Higher regional education levels correlate with more complex tweet language.
Abstract
Twitter has rapidly emerged as one of the largest worldwide venues for written communication. Thanks to the ease with which vast quantities of tweets can be mined, Twitter has also become a source for studying modern linguistic style. The readability of text has long provided a simple method to characterize the complexity of language and ease that documents may be understood by readers. In this note we use a modified version of the Flesch Reading Ease formula, applied to a corpus of 17.4 million tweets. We find tweets have characteristically more difficult readability scores compared to other short format communication, such as SMS or chat. This linguistic difference is insensitive to the presence of "hashtags" within tweets. By utilizing geographic data provided by 2% of users, joined with "ZIP Code Tabulation Area" (ZCTA) level education data from the U.S. Census, we find an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Text Readability and Simplification · Web Data Mining and Analysis
