Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model

Runheng Liu; Heyan Huang; Xingchen Xiao; Zhijing Wu

arXiv:2604.21223·cs.CL·April 24, 2026

Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model

Runheng Liu, Heyan Huang, Xingchen Xiao, Zhijing Wu

PDF

1 Video

TL;DR

This paper introduces IRM, a zero-shot method using Implicit Reward Models to detect LLM-generated text effectively without additional training, outperforming existing methods on benchmark tests.

Contribution

The paper presents IRM, a novel zero-shot detection approach that leverages publicly available models, eliminating the need for preference collection or task-specific fine-tuning.

Findings

01

IRM achieves superior detection performance on the DetectRL benchmark.

02

IRM outperforms existing zero-shot and supervised detection methods.

03

IRM does not require preference collection or additional training.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their ability to generate human-like text has raised concerns about potential misuse. This underscores the need for reliable and effective methods to detect LLM-generated text. In this paper, we propose IRM, a novel zero-shot approach that leverages Implicit Reward Models for LLM-generated text detection. Such implicit reward models can be derived from publicly available instruction-tuned and base models. Previous reward-based method relies on preference construction and task-specific fine-tuning. In comparison, IRM requires neither preference collection nor additional training. We evaluate IRM on the DetectRL benchmark and demonstrate that IRM can achieve superior detection performance, outperforms existing zero-shot and supervised methods in LLM-generated text detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Zero-Shot Detection of LLM-Generated Text via Implicit Reward Model· slideslive