When Bigger Isn't Better: A Comprehensive Fairness Evaluation of Political Bias in Multi-News Summarisation
Nannan Huang, Iffat Maab, Junichi Yamagishi

TL;DR
This paper evaluates political bias in multi-news summarisation systems, revealing that mid-sized models often outperform larger ones in fairness, and debiasing effectiveness varies across methods and dimensions.
Contribution
It introduces a comprehensive fairness evaluation framework for multi-news summarisation, analyzing model size effects and debiasing strategies across multiple fairness metrics.
Findings
Mid-sized models outperform larger ones in fairness.
Prompt-based debiasing effectiveness is model-dependent.
Entity sentiment is highly resistant to debiasing.
Abstract
Multi-document news summarisation systems are increasingly adopted for their convenience in processing vast daily news content, making fairness across diverse political perspectives critical. However, these systems can exhibit political bias through unequal representation of viewpoints, disproportionate emphasis on certain perspectives, and systematic underrepresentation of minority voices. This study presents a comprehensive evaluation of such bias in multi-document news summarisation using FairNews, a dataset of complete news articles with political orientation labels, examining how large language models (LLMs) handle sources with varying political leanings across 13 models and five fairness metrics. We investigate both baseline model performance and effectiveness of various debiasing interventions, including prompt-based and judge-based approaches. Our findings challenge the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
