Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions
Ehsan Mohammady Ardehaly, Aron Culotta

TL;DR
This paper introduces a scalable learning model that infers political sentiment and demographics from Twitter data by leveraging population-level data, reducing the need for costly individual annotations, and closely tracking traditional polls.
Contribution
The paper proposes Weighted Label Regularization, a novel LLP model that supports hierarchical sample weighting, enabling demographic and opinion inference from social media data without individual labels.
Findings
Model achieves 28-44% error reduction compared to baselines.
Estimates align closely with traditional polling data.
Demonstrates ability to analyze linguistic and demographic interactions over time.
Abstract
Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution in of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
