Base Models Look Human To AI Detectors

Yixuan Even Xu; Ziqian Zhong; Aditi Raghunathan; Fei Fang; J. Zico Kolter

arXiv:2605.19516·cs.CL·May 20, 2026

Base Models Look Human To AI Detectors

Yixuan Even Xu, Ziqian Zhong, Aditi Raghunathan, Fei Fang, J. Zico Kolter

PDF

1 Repo 14 Models 1 Datasets

TL;DR

This paper reveals that base models' generated text often appears more human-like to detectors than instruction-tuned models, and introduces HIP, a fine-tuning pipeline that improves evasion of commercial detectors across multiple model families.

Contribution

The paper uncovers a surprising detector bias towards instruction-tuned models and proposes HIP, a novel paraphrasing-based fine-tuning method to evade detectors more effectively.

Findings

01

Base models' text is judged more human-like than instruction-tuned models.

02

HIP improves detector evasion across various models and sizes.

03

Detectors focus on artifacts of instruction tuning and local context, not invariant machine-generated features.

Abstract

As AI-generated text enters the real-world at scale, institutions increasingly use commercial AI-text detectors, especially in education and academic-integrity workflows. We report a surprising empirical finding about such systems: when evaluated by GPTZero and Pangram, generated text from base models is often judged overwhelmingly human, whereas text generated by their instruction-tuned counterparts is not. Building on this observation, we propose Humanization by Iterative Paraphrasing (HIP), a detector-agnostic pipeline that minimally fine-tunes a base model into a paraphraser and applies it iteratively. Compared with the baselines we test, HIP yields a stronger trade-off between semantic preservation and detector evasion on commercial detectors. Across Llama-3 and Qwen-3 families, spanning model sizes from 0.6B to 70B, HIP consistently improves detector human-likeness. Our findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yixuanevenxu/humanization-by-iterative-paraphrasing
github

Models

Datasets

YixuanEvenXu/HIP-training-and-evaluation-data
dataset· 84 dl
84 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.