Fine-grained Czech News Article Dataset: An Interdisciplinary Approach to Trustworthiness Analysis
Maty\'a\v{s} Boh\'a\v{c}ek, Michal Bravansk\'y, Filip Trhl\'ik and, V\'aclav Moravec

TL;DR
This paper introduces the Verifee Dataset, a comprehensive collection of Czech news articles annotated for trustworthiness, along with a methodology for assessment, and demonstrates its use in training models to classify credibility levels.
Contribution
The paper presents a new interdisciplinary dataset with detailed trustworthiness annotations and a methodology for assessing news articles, advancing research in media credibility analysis.
Findings
Achieved a best F-1 score of 0.52 with fine-tuned language models.
Collected over 10,000 articles from 60 Czech news sources.
Provided open access to dataset, methodology, and instructions.
Abstract
We present the Verifee Dataset: a novel dataset of news articles with fine-grained trustworthiness annotations. We develop a detailed methodology that assesses the texts based on their parameters encompassing editorial transparency, journalist conventions, and objective reporting while penalizing manipulative techniques. We bring aboard a diverse set of researchers from social, media, and computer sciences to overcome barriers and limited framing of this interdisciplinary problem. We collect over unique articles from almost Czech online news sources. These are categorized into one of the classes across the credibility spectrum we propose, raging from entirely trustworthy articles all the way to the manipulative ones. We produce detailed statistics and study trends emerging throughout the set. Lastly, we fine-tune multiple popular sequence-to-sequence language models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection
