Fake News Detection: Comparative Evaluation of BERT-like Models and   Large Language Models with Generative AI-Annotated Data

Shaina Raza; Drai Paulen-Patterson; Chen Ding

arXiv:2412.14276·cs.CL·December 23, 2024·2 cites

Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

Shaina Raza, Drai Paulen-Patterson, Chen Ding

PDF

Open Access 1 Repo

TL;DR

This paper compares BERT-like models and large language models for fake news detection, introducing a GPT-4-assisted labeled dataset, and finds BERT models generally perform better, while LLMs are more robust to text perturbations.

Contribution

It presents a new dataset labeled with GPT-4 and human verification, and evaluates the performance of different models and AI annotation methods for fake news detection.

Findings

01

BERT-like models outperform LLMs in classification accuracy.

02

LLMs show greater robustness to text perturbations.

03

AI labels with human supervision improve classification results.

Abstract

Fake news poses a significant threat to public opinion and social stability in modern society. This study presents a comparative evaluation of BERT-like encoder-only models and autoregressive decoder-only large language models (LLMs) for fake news detection. We introduce a dataset of news articles labeled with GPT-4 assistance (an AI-labeling method) and verified by human experts to ensure reliability. Both BERT-like encoder-only models and LLMs were fine-tuned on this dataset. Additionally, we developed an instruction-tuned LLM approach with majority voting during inference for label generation. Our analysis reveals that BERT-like models generally outperform LLMs in classification tasks, while LLMs demonstrate superior robustness against text perturbations. Compared to weak labels (distant supervision) data, the results show that AI labels with human supervision achieve better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

draip96/fakenewsclassification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Adam · Layer Normalization · Softmax