Explaining News Bias Detection: A Comparative SHAP Analysis of Transformer Model Decision Mechanisms
Himel Ghosh

TL;DR
This study compares two transformer-based news bias detection models using SHAP explanations, revealing how their decision mechanisms differ and highlighting the importance of interpretability for improving model reliability in journalism.
Contribution
It provides a detailed interpretability analysis of bias detection models, showing how architecture and training influence their decision processes and error patterns.
Findings
Both models focus on evaluative language categories.
The bias detector over-flags neutral content, especially false positives.
Domain-adaptive model reduces false positives by 63%.
Abstract
Automated bias detection in news text is heavily used to support journalistic analysis and media accountability, yet little is known about how bias detection models arrive at their decisions or why they fail. In this work, we present a comparative interpretability study of two transformer-based bias detection models: a bias detector fine-tuned on the BABE dataset and a domain-adapted pre-trained RoBERTa model fine-tuned on the BABE dataset, using SHAP-based explanations. We analyze word-level attributions across correct and incorrect predictions to characterize how different model architectures operationalize linguistic bias. Our results show that although both models attend to similar categories of evaluative language, they differ substantially in how these signals are integrated into predictions. The bias detector model assigns stronger internal evidence to false positives than to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Computational and Text Analysis Methods · Topic Modeling
