Filtering Context Mitigates Scarcity and Selection Bias in Political   Ideology Prediction

Chen Chen; Dylan Walker; Venkatesh Saligrama

arXiv:2302.00239·cs.LG·February 2, 2023

Filtering Context Mitigates Scarcity and Selection Bias in Political Ideology Prediction

Chen Chen, Dylan Walker, Venkatesh Saligrama

PDF

Open Access

TL;DR

This paper introduces a novel supervised learning model for political ideology prediction that effectively handles scarce, biased data and out-of-distribution inputs by decomposing document embeddings into context and position vectors.

Contribution

The paper presents a new statistical model that separates neutral context from ideological position in document embeddings, enabling more accurate predictions with limited and biased data.

Findings

01

Model predicts accurately with as little as 5% biased data.

02

Outperforms state-of-the-art in ideological prediction.

03

Context filtering improves out-of-distribution prediction.

Abstract

We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear superposition of two vectors; a latent neutral \emph{context} vector independent of ideology, and a latent \emph{position} vector aligned with ideology. We train an end-to-end model that has intermediate contextual and positional vectors as outputs. At deployment time, our model predicts labels for input documents by exclusively leveraging the predicted positional vectors. On two benchmark datasets we show that our model is capable of outputting predictions even when trained with as little as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Natural Language Processing Techniques