Na\"ive Exposure of Generative AI Capabilities Undermines Deepfake Detection

Sunpill Kim; Chanwoo Hwang; Minsu Kim; Jae Hong Seo

arXiv:2603.10504·cs.CR·March 12, 2026

Na\"ive Exposure of Generative AI Capabilities Undermines Deepfake Detection

Sunpill Kim, Chanwoo Hwang, Minsu Kim, Jae Hong Seo

PDF

Open Access

TL;DR

This paper demonstrates that the widespread use of commercial generative AI systems with benign prompts can undermine deepfake detection, as refined images evade detection and maintain high perceptual quality, posing security risks.

Contribution

It reveals how commercial generative AI exposes authenticity reasoning that enables evasion of deepfake detectors, highlighting a gap between threat models and real-world AI capabilities.

Findings

01

State-of-the-art detectors fail against AI-refined images

02

Commercial AI systems enable effective evasion by non-experts

03

Refined images maintain identity and high perceptual quality

Abstract

Generative AI systems increasingly expose powerful reasoning and image refinement capabilities through user-facing chatbot interfaces. In this work, we show that the na\"ive exposure of such capabilities fundamentally undermines modern deepfake detectors. Rather than proposing a new image manipulation technique, we study a realistic and already-deployed usage scenario in which an adversary uses only benign, policy-compliant prompts and commercial generative AI systems. We demonstrate that state-of-the-art deepfake detection methods fail under semantic-preserving image refinement. Specifically, we show that generative AI systems articulate explicit authenticity criteria and inadvertently externalize them through unrestricted reasoning, enabling their direct reuse as refinement objectives. As a result, refined images simultaneously evade detection, preserve identity as verified by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Spam and Phishing Detection · Misinformation and Its Impacts