Unsupervised Bias Detection in College Student Newspapers

Adam M. Lehavi; William McCormack; Noah Kornfeld; Solomon Glazer

arXiv:2309.06557·cs.CL·September 14, 2023

Unsupervised Bias Detection in College Student Newspapers

Adam M. Lehavi, William McCormack, Noah Kornfeld, Solomon Glazer

PDF

Open Access

TL;DR

This paper introduces an automated pipeline for scraping college newspaper archives and detecting bias by comparing sentiment analysis of summaries to original articles, enabling nuanced bias insights with minimal human input.

Contribution

It presents a novel framework combining web scraping, dataset creation, and sentiment comparison to detect bias in college newspapers with minimal labeling.

Findings

01

Successfully created a dataset of 23,154 entries from 14 student papers

02

Demonstrated bias detection using sentiment comparison on politically charged words

03

Showed the method's effectiveness in extracting nuanced bias insights

Abstract

This paper presents a pipeline with minimal human influence for scraping and detecting bias on college newspaper archives. This paper introduces a framework for scraping complex archive sites that automated tools fail to grab data from, and subsequently generates a dataset of 14 student papers with 23,154 entries. This data can also then be queried by keyword to calculate bias by comparing the sentiment of a large language model summary to the original article. The advantages of this approach are that it is less comparative than reconstruction bias and requires less labelled data than generating keyword sentiment. Results are calculated on politically charged words as well as control words to show how conclusions can be drawn. The complete method facilitates the extraction of nuanced insights with minimal assumptions and categorizations, paving the way for a more objective understanding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

Methodsfail