The Surprising Performance of Simple Baselines for Misinformation   Detection

Kellin Pelrine; Jacob Danovitch; Reihaneh Rabbany

arXiv:2104.06952·cs.CL·April 15, 2021

The Surprising Performance of Simple Baselines for Misinformation Detection

Kellin Pelrine, Jacob Danovitch, Reihaneh Rabbany

PDF

2 Repos

TL;DR

This paper demonstrates that simple fine-tuned transformer models can outperform complex methods in misinformation detection, highlighting the importance of baseline evaluation and dataset design.

Contribution

It provides a comprehensive benchmark showing simple transformer baselines are competitive, and discusses dataset issues like data leakage affecting misinformation detection.

Findings

01

Transformer models outperform complex methods with basic fine-tuning.

02

Classifying only tweet IDs can achieve state-of-the-art results due to data leakage.

03

Highlights the need for careful dataset design and evaluation protocols.

Abstract

As social media becomes increasingly prominent in our day to day lives, it is increasingly important to detect informative content and prevent the spread of disinformation and unverified rumours. While many sophisticated and successful models have been proposed in the literature, they are often compared with older NLP baselines such as SVMs, CNNs, and LSTMs. In this paper, we examine the performance of a broad set of modern transformer-based language models and show that with basic fine-tuning, these models are competitive with and can even significantly outperform recently proposed state-of-the-art methods. We present our framework as a baseline for creating and evaluating new methods for misinformation detection. We further study a comprehensive set of benchmark datasets, and discuss potential data leakage and the need for careful design of the experiments and understanding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.