BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages

Jason Lucas; Matt Murtagh-White; Adaku Uchendu; Ali Al-Lawati; Michiharu Yamashita; Dominik Macko; Ivan Srba; Robert Moro; Dongwon Lee

arXiv:2603.00634·cs.CL·March 3, 2026

BLUFF: Benchmarking the Detection of False and Synthetic Content across 58 Low-Resource Languages

Jason Lucas, Matt Murtagh-White, Adaku Uchendu, Ali Al-Lawati, Michiharu Yamashita, Dominik Macko, Ivan Srba, Robert Moro, Dongwon Lee

PDF

Open Access 1 Datasets

TL;DR

BLUFF is a large-scale multilingual benchmark dataset designed to evaluate false and synthetic content detection across 79 languages, addressing the gap in low-resource language coverage and providing tools for advancing equitable misinformation detection.

Contribution

The paper introduces BLUFF, a comprehensive multilingual benchmark dataset covering 79 languages, with novel content generation and filtering methods, to improve false content detection in low-resource languages.

Findings

01

State-of-the-art detectors degrade up to 25.3% F1 on low-resource languages.

02

BLUFF covers both high-resource and low-resource languages, filling a critical research gap.

03

Extensive linguistic-oriented evaluation and open-source tools are provided.

Abstract

Multilingual falsehoods threaten information integrity worldwide, yet detection benchmarks remain confined to English or a few high-resource languages, leaving low-resource linguistic communities without robust defense tools. We introduce BLUFF, a comprehensive benchmark for detecting false and synthetic content, spanning 79 languages with over 202K samples, combining human-written fact-checked content (122K+ samples across 57 languages) and LLM-generated content (79K+ samples across 71 languages). BLUFF uniquely covers both high-resource "big-head" (20) and low-resource "long-tail" (59) languages, addressing critical gaps in multilingual research on detecting false and synthetic content. Our dataset features four content types (human-written, LLM-generated, LLM-translated, and hybrid human-LLM text), bidirectional translation (English $\leftrightarrow$ X), 39 textual modification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

jsl5710/BLUFF
dataset· 1.8k dl
1.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection