Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election
Roberto Mondini, Neema Kotonya, Robert L. Logan IV, Elizabeth M Olson,, Angela Oduor Lungati, Daniel Duke Odongo, Tim Ombasa, Hemank Lamba, Aoife, Cahill, Joel R. Tetreault, Alejandro Jaimes

TL;DR
Uchaguzi-2022 is a comprehensive dataset of 14,000 geotagged and categorized citizen reports on the 2022 Kenyan election, enabling scalable analysis of election-related issues using language models.
Contribution
The paper introduces Uchaguzi-2022, a novel dataset for election-related citizen reports, and explores AI methods to automate categorization and geotagging tasks.
Findings
Language models can assist in categorizing reports.
Automated geotagging shows promising accuracy.
Dataset supports AI for Social Good applications.
Abstract
Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMedia Influence and Politics · Computational and Text Analysis Methods
