Using NLP to measure democracy

Thiago Marzag\~ao

arXiv:1502.06161·cs.CL·February 24, 2015·1 cites

Using NLP to measure democracy

Thiago Marzag\~ao

PDF

Open Access

TL;DR

This paper develops a machine-coded democracy index using NLP techniques on news articles, providing a replicable, scalable, and precise measure that improves upon existing indices and allows user customization.

Contribution

Introduces the first NLP-based, machine-coded democracy index (ADS) with superior accuracy and replicability, utilizing supervised learning and a web tool for customization.

Findings

01

Wordscores algorithm outperforms other methods

02

ADS has small standard errors, enabling case distinctions

03

Web application allows user-driven adjustments

Abstract

This paper uses natural language processing to create the first machine-coded democracy index, which I call Automated Democracy Scores (ADS). The ADS are based on 42 million news articles from 6,043 different sources and cover all independent countries in the 1993-2012 period. Unlike the democracy indices we have today the ADS are replicable and have standard errors small enough to actually distinguish between cases. The ADS are produced with supervised learning. Three approaches are tried: a) a combination of Latent Semantic Analysis and tree-based regression methods; b) a combination of Latent Dirichlet Allocation and tree-based regression methods; and c) the Wordscores algorithm. The Wordscores algorithm outperforms the alternatives, so it is the one on which the ADS are based. There is a web application where anyone can change the training set and see how the results change:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection