$k$NNProxy: Efficient Training-Free Proxy Alignment for Black-Box Zero-Shot LLM-Generated Text Detection
Kahim Wong, Kemou Li, Haiwei Wu, Jiantao Zhou

TL;DR
The paper introduces $k$NNProxy, a training-free, query-efficient method for aligning black-box LLMs to improve zero-shot detection of AI-generated text, addressing domain shift and deployment challenges.
Contribution
It proposes a novel $k$NN-based proxy alignment framework that avoids fine-tuning and API interactions, enhancing robustness and efficiency in LGT detection.
Findings
Achieves strong detection performance in experiments.
Effectively handles domain shifts with a mixture of proxies.
Eliminates the need for proxy fine-tuning or API access during inference.
Abstract
LLM-generated text (LGT) detection is essential for reliable forensic analysis and for mitigating LLM misuse. Existing LGT detectors can generally be categorized into two broad classes: learning-based approaches and zero-shot methods. Compared with learning-based detectors, zero-shot methods are particularly promising because they eliminate the need to train task-specific classifiers. However, the reliability of zero-shot methods fundamentally relies on the assumption that an off-the-shelf proxy LLM is well aligned with the often unknown source LLM, a premise that rarely holds in real-world black-box scenarios. To address this discrepancy, existing proxy alignment methods typically rely on supervised fine-tuning of the proxy or repeated interactions with commercial APIs, thereby increasing deployment costs, exposing detectors to silent API changes, and limiting robustness under domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
