Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework
Zhen Tao, Zhiyu Li, Runyu Chen, Dinghao Xi, Wei Xu

TL;DR
This paper introduces a multi-level, fine-grained detection framework for identifying texts generated by large language models, combining structural, semantic, and linguistic features with contrastive learning to improve accuracy and robustness.
Contribution
The paper presents a novel multi-level detection framework that integrates various linguistic features and contrastive learning to effectively identify LLM-generated texts, especially in academic contexts.
Findings
Outperforms existing detection methods with 88.56% accuracy.
Effectively detects subtle differences and paraphrased LLM texts.
Enhances robustness against evasion techniques.
Abstract
Large language models (LLMs) have transformed human writing by enhancing grammar correction, content expansion, and stylistic refinement. However, their widespread use raises concerns about authorship, originality, and ethics, even potentially threatening scholarly integrity. Existing detection methods, which mainly rely on single-feature analysis and binary classification, often fail to effectively identify LLM-generated text in academic contexts. To address these challenges, we propose a novel Multi-level Fine-grained Detection (MFD) framework that detects LLM-generated text by integrating low-level structural, high-level semantic, and deep-level linguistic features, while conducting sentence-level evaluations of lexicon, grammar, and syntax for comprehensive analysis. To improve detection of subtle differences in LLM-generated text and enhance robustness against paraphrasing, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsMasked autoencoder
