A Machine Learning Pipeline to Examine Political Bias with Congressional Speeches
Prasad hajare, Sadia Kamal, Siddharth Krishnan, and Arunkumar, Bagavathi

TL;DR
This paper introduces a machine learning pipeline that uses congressional speech transcripts to detect political bias in social media, achieving high accuracy without relying on manually labeled data.
Contribution
The work presents a novel approach that leverages speech transcripts for bias labeling and combines cascade and text features for bias prediction in social media.
Findings
Achieved 70.5% accuracy on Twitter data
Achieved 65.1% accuracy on Gab data
Forecasted cascade bias with about 85% accuracy
Abstract
Computational methods to model political bias in social media involve several challenges due to heterogeneity, high-dimensional, multiple modalities, and the scale of the data. Political bias in social media has been studied in multiple viewpoints like media bias, political ideology, echo chambers, and controversies using machine learning pipelines. Most of the current methods rely heavily on the manually-labeled ground-truth data for the underlying political bias prediction tasks. Limitations of such methods include human-intensive labeling, labels related to only a specific problem, and the inability to determine the near future bias state of a social media conversation. In this work, we address such problems and give machine learning approaches to study political bias in two ideologically diverse social media forums: Gab and Twitter without the availability of human-annotated data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Computational and Text Analysis Methods
