IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector

Zheng Chen; Yushi Feng; Jisheng Dang; Yue Deng; Changyang He; Hongxi Pu; Haoxuan Li; Bo Li

arXiv:2502.15902·cs.LG·November 19, 2025

IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector

Zheng Chen, Yushi Feng, Jisheng Dang, Yue Deng, Changyang He, Hongxi Pu, Haoxuan Li, Bo Li

PDF

Open Access 1 Repo

TL;DR

IPAD introduces a robust, interpretable framework for detecting AI-generated text by identifying potential prompts and examining text-prompt alignment, significantly improving accuracy and trustworthiness across various data scenarios.

Contribution

The paper presents IPAD, a novel inverse prompt-based framework that enhances robustness and interpretability in detecting LLM-generated texts, outperforming existing methods.

Findings

01

IPAD achieves 9.05% higher recall on in-distribution data.

02

IPAD improves AUROC by 12.93% on out-of-distribution data.

03

IPAD provides interpretable evidence supporting detection decisions.

Abstract

Large Language Models (LLMs) have attained human-level fluency in text generation, which complicates the distinguishing between human-written and LLM-generated texts. This increases the risk of misuse and highlights the need for reliable detectors. Yet, existing detectors exhibit poor robustness on out-of-distribution (OOD) data and attacked data, which is critical for real-world scenarios. Also, they struggle to provide interpretable evidence to support their decisions, thus undermining the reliability. In light of these challenges, we propose IPAD (Inverse Prompt for AI Detection), a novel framework consisting of a Prompt Inverter that identifies predicted prompts that could have generated the input text, and two Distinguishers that examine the probability that the input texts align with the predicted prompts. Empirical evaluations demonstrate that IPAD outperforms the strongest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Bellafc/IPAD-Inver-Prompt-for-AI-Detection-
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsALIGN