Detection of Conspiracy Theories Beyond Keyword Bias in German-Language   Telegram Using Large Language Models

Milena Pustet; Elisabeth Steffen; Helena Mihaljevi\'c

arXiv:2404.17985·cs.CL·January 22, 2025

Detection of Conspiracy Theories Beyond Keyword Bias in German-Language Telegram Using Large Language Models

Milena Pustet, Elisabeth Steffen, Helena Mihaljevi\'c

PDF

Open Access 1 Models 1 Video

TL;DR

This study compares supervised and prompt-based large language model approaches for detecting conspiracy theories in German Telegram messages, achieving high accuracy without keyword bias and demonstrating adaptability over time.

Contribution

It introduces a German-language conspiracy theory detection dataset and evaluates both fine-tuning and prompt-based methods, highlighting their effectiveness and adaptability.

Findings

01

Supervised models achieved an F1 score of ~0.8.

02

Prompt-based GPT-4 achieved an F1 score of ~0.8 in zero-shot.

03

Models maintained performance over temporal shifts.

Abstract

The automated detection of conspiracy theories online typically relies on supervised learning. However, creating respective training data requires expertise, time and mental resilience, given the often harmful content. Moreover, available datasets are predominantly in English and often keyword-based, introducing a token-level bias into the models. Our work addresses the task of detecting conspiracy theories in German Telegram messages. We compare the performance of supervised fine-tuning approaches using BERT-like models with prompt-based approaches using Llama2, GPT-3.5, and GPT-4 which require little or no additional training data. We use a dataset of $\sim 4, 000$ messages collected during the COVID-19 pandemic, without the use of keyword filters. Our findings demonstrate that both approaches can be leveraged effectively: For supervised fine-tuning, we report an F1 score of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
digitaler-hass/TelConGBERT
model· 20 dl
20 dl

Videos

Detection of Conspiracy Theories Beyond Keyword Bias in German-Language Telegram Using Large Language Models· underline

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Authorship Attribution and Profiling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Linear Layer · Label Smoothing · Adam · Layer Normalization · Attention Dropout