Unmasking the Imposters: How Censorship and Domain Adaptation Affect the   Detection of Machine-Generated Tweets

Bryan E. Tuck; Rakesh M. Verma

arXiv:2406.17967·cs.CL·January 22, 2025

Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets

Bryan E. Tuck, Rakesh M. Verma

PDF

Open Access

TL;DR

This paper investigates how censorship and domain adaptation impact the detection of machine-generated tweets from various large language models, revealing that uncensored models challenge current detection methods and highlighting the importance of understanding content moderation effects.

Contribution

It introduces a comprehensive dataset and analysis framework for evaluating the detectability of machine-generated tweets from multiple LLMs under different censorship conditions, focusing on smaller open-source models.

Findings

01

Uncensored models significantly reduce detection effectiveness.

02

Censorship and domain adaptation alter textual features and detection performance.

03

Differences between human and machine-generated text are affected by censorship.

Abstract

The rapid development of large language models (LLMs) has significantly improved the generation of fluent and convincing text, raising concerns about their potential misuse on social media platforms. We present a comprehensive methodology for creating nine Twitter datasets to examine the generative capabilities of four prominent LLMs: Llama 3, Mistral, Qwen2, and GPT4o. These datasets encompass four censored and five uncensored model configurations, including 7B and 8B parameter base-instruction models of the three open-source LLMs. Additionally, we perform a data quality analysis to assess the characteristics of textual outputs from human, "censored," and "uncensored" models, employing semantic meaning, lexical richness, structural patterns, content characteristics, and detector performance metrics to identify differences and similarities. Our evaluation demonstrates that "uncensored"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImpact of Technology on Adolescents · Mental Health via Writing · Hate Speech and Cyberbullying Detection

MethodsLLaMA