On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs

Herun Wan; Minnan Luo; Zhixiong Su; Guang Dai; Xiang Zhao

arXiv:2410.12600·cs.CL·May 30, 2025

On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs

Herun Wan, Minnan Luo, Zhixiong Su, Guang Dai, Xiang Zhao

PDF

Open Access 1 Video

TL;DR

This paper investigates how evidence pollution from LLMs can undermine malicious social text detectors and proposes strategies to mitigate this risk, highlighting significant performance drops caused by polluted evidence.

Contribution

It identifies the threat of evidence pollution from LLMs to social text detection and evaluates mitigation strategies, providing insights into their limitations and impact.

Findings

01

Evidence pollution causes up to 14.4% performance drop.

02

Polluted evidence is high quality and affects model calibration.

03

Mitigation strategies can reduce impact but face practical limitations.

Abstract

Evidence-enhanced detectors present remarkable abilities in identifying malicious social text. However, the rise of large language models (LLMs) brings potential risks of evidence pollution to confuse detectors. This paper explores potential manipulation scenarios including basic pollution, and rephrasing or generating evidence by LLMs. To mitigate the negative impact, we propose three defense strategies from the data and model sides, including machine-generated text detection, a mixture of experts, and parameter updating. Extensive experiments on four malicious social text detection tasks with ten datasets illustrate that evidence pollution significantly compromises detectors, where the generating strategy causes up to a 14.4% performance drop. Meanwhile, the defense strategies could mitigate evidence pollution, but they faced limitations for practical employment. Further analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs· underline

Taxonomy

TopicsSpam and Phishing Detection · Digital and Cyber Forensics