Can We Trust LLM Detectors?

Jivnesh Sandhan; Harshit Jaiswal; Fei Cheng; Yugo Murawaki

arXiv:2601.15301·cs.CL·January 28, 2026

Can We Trust LLM Detectors?

Jivnesh Sandhan, Harshit Jaiswal, Fei Cheng, Yugo Murawaki

PDF

Open Access

TL;DR

This paper evaluates the robustness of existing LLM detectors, revealing their brittleness under distribution shifts and proposing a supervised contrastive learning framework to improve domain generalization.

Contribution

It systematically assesses current detection methods and introduces a novel supervised contrastive learning approach for more reliable AI text detection across domains.

Findings

01

Supervised detectors perform well in-domain but poorly out-of-domain.

02

Training-free methods are highly sensitive to proxy choice.

03

Fundamental challenges exist in creating domain-agnostic detectors.

Abstract

The rapid adoption of LLMs has increased the need for reliable AI text detection, yet existing detectors often fail outside controlled benchmarks. We systematically evaluate 2 dominant paradigms (training-free and supervised) and show that both are brittle under distribution shift, unseen generators, and simple stylistic perturbations. To address these limitations, we propose a supervised contrastive learning (SCL) framework that learns discriminative style embeddings. Experiments show that while supervised detectors excel in-domain, they degrade sharply out-of-domain, and training-free methods remain highly sensitive to proxy choice. Overall, our results expose fundamental challenges in building domain-agnostic detectors. Our code is available at: https://github.com/HARSHITJAIS14/DetectAI

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling · Text and Document Classification Technologies