Unsupervised Bias Detection in College Student Newspapers
Adam M. Lehavi, William McCormack, Noah Kornfeld, Solomon Glazer

TL;DR
This paper introduces an automated pipeline for scraping college newspaper archives and detecting bias by comparing sentiment analysis of summaries to original articles, enabling nuanced bias insights with minimal human input.
Contribution
It presents a novel framework combining web scraping, dataset creation, and sentiment comparison to detect bias in college newspapers with minimal labeling.
Findings
Successfully created a dataset of 23,154 entries from 14 student papers
Demonstrated bias detection using sentiment comparison on politically charged words
Showed the method's effectiveness in extracting nuanced bias insights
Abstract
This paper presents a pipeline with minimal human influence for scraping and detecting bias on college newspaper archives. This paper introduces a framework for scraping complex archive sites that automated tools fail to grab data from, and subsequently generates a dataset of 14 student papers with 23,154 entries. This data can also then be queried by keyword to calculate bias by comparing the sentiment of a large language model summary to the original article. The advantages of this approach are that it is less comparative than reconstruction bias and requires less labelled data than generating keyword sentiment. Results are calculated on politically charged words as well as control words to show how conclusions can be drawn. The complete method facilitates the extraction of nuanced insights with minimal assumptions and categorizations, paving the way for a more objective understanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
Methodsfail
