Mitigation of Diachronic Bias in Fake News Detection Dataset

Taichi Murayama; Shoko Wakamiya; Eiji Aramaki

arXiv:2108.12601·cs.CL·November 9, 2021

Mitigation of Diachronic Bias in Fake News Detection Dataset

Taichi Murayama, Shoko Wakamiya, Eiji Aramaki

PDF

Open Access

TL;DR

This paper identifies diachronic bias in fake news datasets caused by temporal dependence on proper nouns and proposes Wikidata-based masking methods to improve model robustness against such biases.

Contribution

It introduces a novel approach using Wikidata to mask proper nouns, mitigating diachronic bias in fake news detection datasets.

Findings

01

Masking proper nouns reduces bias in datasets.

02

Proposed methods improve detection robustness.

03

Experiments show effectiveness on in-domain and out-of-domain data.

Abstract

Fake news causes significant damage to society.To deal with these fake news, several studies on building detection models and arranging datasets have been conducted. Most of the fake news datasets depend on a specific time period. Consequently, the detection models trained on such a dataset have difficulty detecting novel fake news generated by political changes and social changes; they may possibly result in biased output from the input, including specific person names and organizational names. We refer to this problem as \textbf{Diachronic Bias} because it is caused by the creation date of news in each dataset. In this study, we confirm the bias, especially proper nouns including person names, from the deviation of phrase appearances in each dataset. Based on these findings, we propose masking methods using Wikidata to mitigate the influence of person names and validate whether they…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Spam and Phishing Detection