Fine-grained Czech News Article Dataset: An Interdisciplinary Approach   to Trustworthiness Analysis

Maty\'a\v{s} Boh\'a\v{c}ek; Michal Bravansk\'y; Filip Trhl\'ik and; V\'aclav Moravec

arXiv:2212.08550·cs.CL·December 19, 2022

Fine-grained Czech News Article Dataset: An Interdisciplinary Approach to Trustworthiness Analysis

Maty\'a\v{s} Boh\'a\v{c}ek, Michal Bravansk\'y, Filip Trhl\'ik and, V\'aclav Moravec

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the Verifee Dataset, a comprehensive collection of Czech news articles annotated for trustworthiness, along with a methodology for assessment, and demonstrates its use in training models to classify credibility levels.

Contribution

The paper presents a new interdisciplinary dataset with detailed trustworthiness annotations and a methodology for assessing news articles, advancing research in media credibility analysis.

Findings

01

Achieved a best F-1 score of 0.52 with fine-tuned language models.

02

Collected over 10,000 articles from 60 Czech news sources.

03

Provided open access to dataset, methodology, and instructions.

Abstract

We present the Verifee Dataset: a novel dataset of news articles with fine-grained trustworthiness annotations. We develop a detailed methodology that assesses the texts based on their parameters encompassing editorial transparency, journalist conventions, and objective reporting while penalizing manipulative techniques. We bring aboard a diverse set of researchers from social, media, and computer sciences to overcome barriers and limited framing of this interdisciplinary problem. We collect over $10, 000$ unique articles from almost $60$ Czech online news sources. These are categorized into one of the $4$ classes across the credibility spectrum we propose, raging from entirely trustworthy articles all the way to the manipulative ones. We produce detailed statistics and study trends emerging throughout the set. Lastly, we fine-tune multiple popular sequence-to-sequence language models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

verifee/czech-ing-the-news-dataset
dataset· 5 dl
5 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection