HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
Fangqi Dai, Xingjian Jiang, Zizhuang Deng

TL;DR
This paper introduces HLPD, a novel method that aligns language models to human writing styles to improve detection of machine-revised texts, especially against advanced LLM outputs and adversarial revisions.
Contribution
HLPD employs a reward-based alignment process to enhance the sensitivity of detection models to human-like writing, addressing limitations of previous methods in adversarial scenarios.
Findings
HLPD improves AUROC by 15.11% over ImBD on GPT-revised texts.
HLPD surpasses Fast-DetectGPT by 45.56% in detecting GPT-revised texts.
HLPD achieves the highest average AUROC on advanced LLM-generated texts.
Abstract
To prevent misinformation and social issues arising from trustworthy-looking content generated by LLMs, it is crucial to develop efficient and reliable methods for identifying the source of texts. Previous approaches have demonstrated exceptional performance in detecting texts fully generated by LLMs. However, these methods struggle when confronting more advanced LLM output or text with adversarial multi-task machine revision, especially in the black-box setting, where the generating model is unknown. To address this challenge, grounded in the hypothesis that human writing possesses distinctive stylistic patterns, we propose Human Language Preference Detection (HLPD). HLPD employs a reward-based alignment process, Human Language Preference Optimization (HLPO), to shift the scoring model's token distribution toward human-like writing, making the model more sensitive to human writing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Authorship Attribution and Profiling
