Unveiling Large Language Models Generated Texts: A Multi-Level   Fine-Grained Detection Framework

Zhen Tao; Zhiyu Li; Runyu Chen; Dinghao Xi; Wei Xu

arXiv:2410.14231·cs.CL·October 21, 2024

Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework

Zhen Tao, Zhiyu Li, Runyu Chen, Dinghao Xi, Wei Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-level, fine-grained detection framework for identifying texts generated by large language models, combining structural, semantic, and linguistic features with contrastive learning to improve accuracy and robustness.

Contribution

The paper presents a novel multi-level detection framework that integrates various linguistic features and contrastive learning to effectively identify LLM-generated texts, especially in academic contexts.

Findings

01

Outperforms existing detection methods with 88.56% accuracy.

02

Effectively detects subtle differences and paraphrased LLM texts.

03

Enhances robustness against evasion techniques.

Abstract

Large language models (LLMs) have transformed human writing by enhancing grammar correction, content expansion, and stylistic refinement. However, their widespread use raises concerns about authorship, originality, and ethics, even potentially threatening scholarly integrity. Existing detection methods, which mainly rely on single-feature analysis and binary classification, often fail to effectively identify LLM-generated text in academic contexts. To address these challenges, we propose a novel Multi-level Fine-grained Detection (MFD) framework that detects LLM-generated text by integrating low-level structural, high-level semantic, and deep-level linguistic features, while conducting sentence-level evaluations of lexicon, grammar, and syntax for comprehensive analysis. To improve detection of subtle differences in LLM-generated text and enhance robustness against paraphrasing, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TaoZhen1110/MFD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsMasked autoencoder