LLM-Detector: Improving AI-Generated Chinese Text Detection with   Open-Source LLM Instruction Tuning

Rongsheng Wang; Haoming Chen; Ruizhe Zhou; Han Ma; Yaofei; Duan; Yanlan Kang; Songhua Yang; Baoyu Fan; Tao Tan

arXiv:2402.01158·cs.CL·February 5, 2024·5 cites

LLM-Detector: Improving AI-Generated Chinese Text Detection with Open-Source LLM Instruction Tuning

Rongsheng Wang, Haoming Chen, Ruizhe Zhou, Han Ma, Yaofei, Duan, Yanlan Kang, Songhua Yang, Baoyu Fan, Tao Tan

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces LLM-Detector, an instruction-tuned LLM-based approach that significantly improves Chinese AI-generated text detection at both sentence and document levels, with strong out-of-domain generalization and easy customization.

Contribution

The paper presents a novel instruction-tuning method for LLMs to enhance Chinese AI-generated text detection, addressing limitations of existing models in out-of-domain scenarios.

Findings

01

Outperforms baseline methods in sentence-level detection

02

Demonstrates strong out-of-domain generalization

03

Easy to customize using open-source LLMs

Abstract

ChatGPT and other general large language models (LLMs) have achieved remarkable success, but they have also raised concerns about the misuse of AI-generated texts. Existing AI-generated text detection models, such as based on BERT and RoBERTa, are prone to in-domain over-fitting, leading to poor out-of-domain (OOD) detection performance. In this paper, we first collected Chinese text responses generated by human experts and 9 types of LLMs, for which to multiple domains questions, and further created a dataset that mixed human-written sentences and sentences polished by LLMs. We then proposed LLM-Detector, a novel method for both document-level and sentence-level text detection through Instruction Tuning of LLMs. Our method leverages the wealth of knowledge LLMs acquire during pre-training, enabling them to detect the text they generate. Instruction tuning aligns the model's responses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QiYuan-tech/LLM-Detector
noneOfficial

Datasets

QiYuan-tech/LLM-Detector
dataset· 17 dl
17 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Dense Connections · WordPiece · Dropout · Softmax · Attention Dropout · Adam