LLM Output Detectability and Task Performance Can be Jointly Optimized

Koshiro Saito; Ryuto Koike; Masahiro Kaneko; Naoaki Okazaki

arXiv:2605.01350·cs.CL·May 5, 2026

LLM Output Detectability and Task Performance Can be Jointly Optimized

Koshiro Saito, Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki

PDF

2 Models

TL;DR

This paper introduces PUPPET, a reinforcement learning framework that enhances both detectability and downstream task performance of LLM outputs, outperforming watermarking methods.

Contribution

PUPPET jointly optimizes LLMs for detectability and task performance, achieving high detectability without sacrificing downstream effectiveness.

Findings

01

PUPPET achieves high detectability comparable to watermarking.

02

It outperforms watermarking on downstream tasks like QA and summarization.

03

Optimization is efficient, requiring only a few thousand samples and minimal GPU hours.

Abstract

Detecting machine-generated text is essential for transparency and accountability when deploying large language models (LLMs). Among detection approaches, watermarking is a statistically reliable method by design -- it embeds detectable signals into LLM outputs by biasing their token distributions. However, it has been reported that watermarked LLMs often perform worse on downstream tasks. We propose PUPPET, a framework that fine-tunes an LLM via reinforcement learning to generate text that is both more detectable and better performing on downstream tasks. We use two reward functions: a detector that outputs a machine-class likelihood and an evaluator that measures a task-specific metric. Experiments on long-form QA, summarization, and essay writing show that LLMs trained with PUPPET achieve high detectability competitive with watermarking methods while outperforming them on downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.