# Characterization of citizens using word2vec and latent topic analysis in   a large set of tweets

**Authors:** Vargas-Calder\'on Vladimir, Camargo Jorge

arXiv: 1904.08926 · 2019-04-22

## TL;DR

This paper presents a machine learning approach using word2vec and latent topic analysis to characterize city communities based on a large dataset of tweets, revealing insights into citizen ideas and community structures.

## Contribution

It introduces a novel method combining word embeddings and topic modeling to automatically detect and analyze city communities from social media data.

## Key findings

- Effective characterization of city populations using tweets
- Identification of community structures through text analytics
- Demonstrated approach on a large dataset of over 2.6 million tweets

## Abstract

With the increasing use of the Internet and mobile devices, social networks are becoming the most used media to communicate citizens' ideas and thoughts. This information is very useful to identify communities with common ideas based on what they publish in the network. This paper presents a method to automatically detect city communities based on machine learning techniques applied to a set of tweets from Bogot\'a's citizens. An analysis was performed in a collection of 2,634,176 tweets gathered from Twitter in a period of six months. Results show that the proposed method is an interesting tool to characterize a city population based on a machine learning methods and text analytics.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.08926/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1904.08926/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1904.08926/full.md

---
Source: https://tomesphere.com/paper/1904.08926