# Latent Human Traits in the Language of Social Media: An Open-Vocabulary   Approach

**Authors:** Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski,, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz

arXiv: 1705.08038 · 2019-03-06

## TL;DR

This paper introduces a novel open-vocabulary method to infer human traits from social media language, demonstrating that language-based traits can predict various real-world outcomes and are comparable in stability to traditional personality models.

## Contribution

It presents a new approach to deriving human traits directly from social media language, moving beyond predefined questionnaires and enabling large-scale, automatic personality inference.

## Key findings

- Language-based traits often predict non-questionnaire outcomes better than traditional traits.
- The derived traits are nearly as stable as established personality factors.
- The approach enables large-scale, automatic personality assessment from social media data.

## Abstract

Over the past century, personality theory and research has successfully identified core sets of characteristics that consistently describe and explain fundamental differences in the way people think, feel and behave. Such characteristics were derived through theory, dictionary analyses, and survey research using explicit self-reports. The availability of social media data spanning millions of users now makes it possible to automatically derive characteristics from language use -- at large scale. Taking advantage of linguistic information available through Facebook, we study the process of inferring a new set of potential human traits based on unprompted language use. We subject these new traits to a comprehensive set of evaluations and compare them with a popular five factor model of personality. We find that our language-based trait construct is often more generalizable in that it often predicts non-questionnaire-based outcomes better than questionnaire-based traits (e.g. entities someone likes, income and intelligence quotient), while the factors remain nearly as stable as traditional factors. Our approach suggests a value in new constructs of personality derived from everyday human language use.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.08038/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/1705.08038/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1705.08038/full.md

---
Source: https://tomesphere.com/paper/1705.08038