Understanding State Preferences With Text As Data: Introducing the UN General Debate Corpus
Alexander Baturo, Niheer Dasandi, Slava J. Mikhaylov

TL;DR
This paper introduces the UN General Debate Corpus, a new dataset of country speeches at the UN, enabling analysis of government preferences and positions on global issues through text analytics.
Contribution
It provides a novel dataset and demonstrates how to extract country positions from speeches, advancing research in international politics using text data.
Findings
The dataset includes over 7,700 speeches from 1970-2016.
Text analysis methods can reliably derive country policy positions.
Applications show the dataset's potential for studying international relations.
Abstract
Every year at the United Nations, member states deliver statements during the General Debate discussing major issues in world politics. These speeches provide invaluable information on governments' perspectives and preferences on a wide range of issues, but have largely been overlooked in the study of international politics. This paper introduces a new dataset consisting of over 7,701 English-language country statements from 1970-2016. We demonstrate how the UN General Debate Corpus (UNGDC) can be used to derive country positions on different policy dimensions using text analytic methods. The paper provides applications of these estimates, demonstrating the contribution the UNGDC can make to the study of international politics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
