Crawling political communities in Twitter and extracting political affiliations
Muhammad Umer Gurchani

TL;DR
This paper introduces a novel focused crawling method for Twitter that effectively extracts community structures and predicts political affiliations, addressing sampling and data collection challenges in social media research.
Contribution
The paper presents a validated seed-based crawl approach that estimates community size and structure without full network data, advancing formalized data collection methods for Twitter.
Findings
Successfully separated French political communities on Twitter
Achieved accurate user political affiliation predictions
Addressed sampling size and data representativeness issues
Abstract
In theory, a major advantage to the big data approach in studying online communities is that it should be possible to collect a representative random sample from a broadly defined population. However, in practice, data collection processes are not formalized, even for famous social media platforms such as Twitter and Facebook. As a result, there is ambiguity left on questions such as "how much data is enough?" and how representative are the samples of the broader population being studied in online social networks. In this paper, I propose a focused back-and-forth crawl approach and a validated seed choice method for collecting network-level data from Twitter. The proposed crawl method can extract community structures without needing a complete network graph for the Twitter network and validate its size using "reference score". It also takes care of the sampling size problem in Twitter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Social Media and Politics
