A Twitter Dataset for Pakistani Political Discourse

Ehsan-Ul Haq; Haris Bin Zia; Reza Hadi Mogavi; Gareth Tyson; Yang K.; Lu; Tristan Braud; Pan Hui

arXiv:2301.06316·cs.SI·January 18, 2023·1 cites

A Twitter Dataset for Pakistani Political Discourse

Ehsan-Ul Haq, Haris Bin Zia, Reza Hadi Mogavi, Gareth Tyson, Yang K., Lu, Tristan Braud, Pan Hui

PDF

Open Access

TL;DR

This paper introduces the largest Pakistani Twitter dataset with over 49 million tweets from a politically active period, enabling research on bias, misinformation, and language processing in Urdu and Roman Urdu.

Contribution

It provides a comprehensive, large-scale dataset of Pakistani Twitter data during a critical political event, supporting diverse downstream analyses.

Findings

01

Dataset includes 49 million tweets from April 2022.

02

Contains tweets in Urdu and Roman Urdu for language processing.

03

Enables studies on political bias, misinformation, and censorship.

Abstract

We share the largest dataset for the Pakistani Twittersphere consisting of over 49 million tweets, collected during one of the most politically active periods in the country. We collect the data after the deposition of the government by a No Confidence Vote in April 2022. This large-scale dataset can be used for several downstream tasks such as political bias, bots detection, trolling behavior, (dis)misinformation, and censorship related to Pakistani Twitter users. In addition, this dataset provides a large collection of tweets in Urdu and Roman Urdu that can be used for optimizing language processing tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Misinformation and Its Impacts