Filtering Context Mitigates Scarcity and Selection Bias in Political Ideology Prediction
Chen Chen, Dylan Walker, Venkatesh Saligrama

TL;DR
This paper introduces a novel supervised learning model for political ideology prediction that effectively handles scarce, biased data and out-of-distribution inputs by decomposing document embeddings into context and position vectors.
Contribution
The paper presents a new statistical model that separates neutral context from ideological position in document embeddings, enabling more accurate predictions with limited and biased data.
Findings
Model predicts accurately with as little as 5% biased data.
Outperforms state-of-the-art in ideological prediction.
Context filtering improves out-of-distribution prediction.
Abstract
We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear superposition of two vectors; a latent neutral \emph{context} vector independent of ideology, and a latent \emph{position} vector aligned with ideology. We train an end-to-end model that has intermediate contextual and positional vectors as outputs. At deployment time, our model predicts labels for input documents by exclusively leveraging the predicted positional vectors. On two benchmark datasets we show that our model is capable of outputting predictions even when trained with as little as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Natural Language Processing Techniques
