DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains

Zhihui Chen; Kai He; Yucheng Huang; Yunxiao Zhu; Mengling Feng

arXiv:2506.06705·cs.CL·June 10, 2025

DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains

Zhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng

PDF

Open Access 1 Video

TL;DR

DivScore is a novel zero-shot detection method that effectively identifies LLM-generated text in specialized domains like medicine and law, outperforming existing detectors especially under domain shift and adversarial conditions.

Contribution

We introduce DivScore, a zero-shot detection framework utilizing normalized entropy scoring and domain knowledge distillation, along with a new benchmark for medical and legal text detection.

Findings

01

DivScore achieves 14.4% higher AUROC than state-of-the-art detectors.

02

DivScore attains 64.0% higher recall at a 0.1% false positive rate.

03

DivScore shows 22.8% advantage in AUROC in adversarial settings.

Abstract

Detecting LLM-generated text in specialized and high-stakes domains like medicine and law is crucial for combating misinformation and ensuring authenticity. However, current zero-shot detectors, while effective on general text, often fail when applied to specialized content due to domain shift. We provide a theoretical analysis showing this failure is fundamentally linked to the KL divergence between human, detector, and source text distributions. To address this, we propose DivScore, a zero-shot detection framework using normalized entropy-based scoring and domain knowledge distillation to robustly identify LLM-generated text in specialized domains. We also release a domain-specific benchmark for LLM-generated text detection in the medical and legal domains. Experiments on our benchmark show that DivScore consistently outperforms state-of-the-art detectors, with 14.4% higher AUROC and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains· underline

Taxonomy

TopicsTopic Modeling · Authorship Attribution and Profiling · Misinformation and Its Impacts