POLygraph: Polish Fake News Dataset

Daniel Dzienisiewicz; Filip Grali\'nski; Piotr Jab{\l}o\'nski; Marek; Kubis; Pawe{\l} Sk\'orzewski; Piotr Wierzcho\'n

arXiv:2407.01393·cs.CL·July 2, 2024

POLygraph: Polish Fake News Dataset

Daniel Dzienisiewicz, Filip Grali\'nski, Piotr Jab{\l}o\'nski, Marek, Kubis, Pawe{\l} Sk\'orzewski, Piotr Wierzcho\'n

PDF

Open Access

TL;DR

The paper introduces POLygraph, a comprehensive Polish fake news dataset with annotated news articles and comments, aiming to advance fake news detection in Polish through new data resources and tools.

Contribution

It presents the creation of a novel, multi-part Polish fake news dataset and a software tool for analyzing content authenticity, filling a gap in resources for Polish-language fake news detection.

Findings

01

Dataset includes 11,360 fake news pairs and 5,082 commented articles.

02

Manual annotation by experts and non-experts ensures data quality.

03

Provides a foundation for future fake news detection models in Polish.

Abstract

This paper presents the POLygraph dataset, a unique resource for fake news detection in Polish. The dataset, created by an interdisciplinary team, is composed of two parts: the "fake-or-not" dataset with 11,360 pairs of news articles (identified by their URLs) and corresponding labels, and the "fake-they-say" dataset with 5,082 news articles (identified by their URLs) and tweets commenting on them. Unlike existing datasets, POLygraph encompasses a variety of approaches from source literature, providing a comprehensive resource for fake news detection. The data was collected through manual annotation by expert and non-expert annotators. The project also developed a software tool that uses advanced machine learning techniques to analyze the data and determine content authenticity. The tool and dataset are expected to benefit various entities, from public sector institutions to publishers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts