PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning

Yuhui Shi; Yehan Yang; Qiang Sheng; Hao Mi; Beizhe Hu; Chaoxi Xu; Juan Cao

arXiv:2506.15683·cs.CL·June 19, 2025

PhantomHunter: Detecting Unseen Privately-Tuned LLM-Generated Text via Family-Aware Learning

Yuhui Shi, Yehan Yang, Qiang Sheng, Hao Mi, Beizhe Hu, Chaoxi Xu, Juan Cao

PDF

Open Access

TL;DR

PhantomHunter is a novel detector designed to identify text generated by unseen, privately-tuned LLMs by capturing shared family traits, significantly improving detection accuracy over existing methods.

Contribution

It introduces a family-aware learning framework that effectively detects privately-tuned LLM-generated text, addressing a key gap in current detection capabilities.

Findings

01

Achieves over 96% F1 score on multiple LLM families

02

Outperforms 7 baseline detectors and 3 industrial services

03

Demonstrates robustness against unseen privately-tuned models

Abstract

With the popularity of large language models (LLMs), undesirable societal problems like misinformation production and academic misconduct have been more severe, making LLM-generated text detection now of unprecedented importance. Although existing methods have made remarkable progress, a new challenge posed by text from privately tuned LLMs remains underexplored. Users could easily possess private LLMs by fine-tuning an open-source one with private corpora, resulting in a significant performance drop of existing detectors in practice. To address this issue, we propose PhantomHunter, an LLM-generated text detector specialized for detecting text from unseen, privately-tuned LLMs. Its family-aware learning framework captures family-level traits shared across the base models and their derivatives, instead of memorizing individual characteristics. Experiments on data from LLaMA, Gemma, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Mathematics, Computing, and Information Processing

MethodsBalanced Selection · LLaMA