Raidar: geneRative AI Detection viA Rewriting

Chengzhi Mao; Carl Vondrick; Hao Wang; Junfeng Yang

arXiv:2401.12970·cs.CL·April 16, 2024·5 cites

Raidar: geneRative AI Detection viA Rewriting

Chengzhi Mao, Carl Vondrick, Hao Wang, Junfeng Yang

PDF

Open Access 2 Repos 1 Video 3 Reviews

TL;DR

Raidar leverages LLMs' rewriting tendencies to detect AI-generated text by measuring editing distances, significantly enhancing detection accuracy across diverse content types.

Contribution

Introduces Raidar, a novel detection method using LLM rewriting behavior, improving existing AI detection models without relying on high-dimensional features.

Findings

01

Raider improves detection scores by up to 29 points.

02

Effective across multiple domains including news, essays, and code.

03

Compatible with black box LLMs and robust on new content.

Abstract

We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting. This tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. We dubbed our geneRative AI Detection viA Rewriting method Raidar. Raidar significantly improves the F1 detection scores of existing AI content detection models -- both academic and commercial -- across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Operating solely on word symbols without high-dimensional features, our method is compatible with black box LLMs, and is inherently robust on new content. Our results…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. Simple method for detecting AI-generated text that essentially consists of asking LM to rewrite the input and then computing edit-distance between the input and the rewritten text. Computationally simple. 2. Significant improvement in detection effectiveness over the baselines. 3. Interesting study on robustness, source of generated data, different LLMs for rewriting, impact of prompts used for rewriting, and the length of the input text.

Weaknesses

1. Proposed method is critically dependent on LLM. Any change to LLM due to continual fine-tuning with new data might have unknown consequences for the detection method. The method might not be robust to LLM fine-tuning. 2. Though the proposed method is simple and computationally low cost, there is a cost associated with every call to the detection algorithm because of the calls made to LLM for rewriting. This might not be acceptable in some scenarios it is cheaper and desirable to have a model

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

* The paper tackles on a task that is very important to help mitigate the misuse of LLMs. * The method proposed is simple yet effective. It is also very intuitive since it is actually also human nature to prefer (=critic less) your own writing than others. * The paper is very well written.

Weaknesses

* It would be nice to actually check the quality of human/LM written ext., i.e. which between the human and LM text do another set of humans prefer? It could be that the LM-written text is actually better quality (and thus requires fewer rewrites). * It would be nice to include a discussion on how this method fares when the LLM is instruction-tuned to do rewrites the way humans do rewrites. Given that text rewriting is a widely used use case for LLMs, it would not be a surprise if these models

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

- The core idea in this work, that LLMs are less likely to make extensive edits to LLM-written text than human-text, is an intuitive and seemingly very reasonable approach. - The paper obtains impressive performance compared to state-of-the-art models (Ghostbuster and GPTZero). In particular, the model seems to work especially well on short documents, which is a known failure mode for existing approaches - The paper introduces three new detection datasets (Yelp reviews, code, arXiv abstracts) an

Weaknesses

1. The paper is missing some important details. For example, the paper does not provide a complete explanation of how the human reviews from the Yelp dataset were selected, nor does it provide a full explanation of how GPT 3.5 was prompted to generate data in the code/Yelp/arXiv domains. The paper also does not describe which scoring model was used for the DetectGPT results. 2. I believe the results on OOD generalization are slightly misleading. Because DetectGPT and GPTZero should produce the

Code & Models

Repositories

Videos

Raidar: geneRative AI Detection viA Rewriting· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)