Never guess what I heard... Rumor Detection in Finnish News: a Dataset   and a Baseline

Mika H\"am\"al\"ainen; Khalid Alnajjar; Niko Partanen; Jack Rueter

arXiv:2106.03389·cs.CL·June 8, 2021

Never guess what I heard... Rumor Detection in Finnish News: a Dataset and a Baseline

Mika H\"am\"al\"ainen, Khalid Alnajjar, Niko Partanen, Jack Rueter

PDF

Open Access

TL;DR

This paper introduces a Finnish rumor detection dataset and evaluates various models, finding that fine-tuned FinBERT achieves high accuracy, but performance varies due to training data differences and language-specific challenges.

Contribution

The study provides a new Finnish rumor detection dataset and benchmarks multiple models, highlighting the impact of training data and language-specific issues on model performance.

Findings

01

FinBERT achieves 94.3% overall accuracy.

02

Multilingual BERT reaches 97.2% factual label accuracy.

03

LSTM outperforms models trained with pretrained word2vec.

Abstract

This study presents a new dataset on rumor detection in Finnish language news headlines. We have evaluated two different LSTM based models and two different BERT models, and have found very significant differences in the results. A fine-tuned FinBERT reaches the best overall accuracy of 94.3% and rumor label accuracy of 96.0% of the time. However, a model fine-tuned on Multilingual BERT reaches the best factual label accuracy of 97.2%. Our results suggest that the performance difference is due to a difference in the original training data. Furthermore, we find that a regular LSTM model works better than one trained with a pretrained word2vec model. These findings suggest that more work needs to be done for pretrained models in Finnish language as they have been trained on small and biased corpora.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Advanced Text Analysis Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Tanh Activation · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout