# Linguistic Diversities of Demographic Groups in Twitter

**Authors:** Pantelis Vikatos, Johnnatan Messias, Manoel Miranda, Fabricio, Benevenuto

arXiv: 1705.03926 · 2017-05-12

## TL;DR

This study analyzes linguistic patterns and interests across demographic groups on Twitter, revealing significant differences in language use and topics among gender and racial categories using advanced image processing and linguistic feature extraction.

## Contribution

It introduces a novel methodology combining image-based demographic inference with linguistic analysis to characterize language differences across demographic groups on Twitter.

## Key findings

- Distinct linguistic styles identified among demographic groups
- Variation in topics of interest across gender and racial lines
- Clear differences in writing attributes and phrase usage

## Abstract

The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.03926/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1705.03926/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1705.03926/full.md

---
Source: https://tomesphere.com/paper/1705.03926