Explainability-Based Token Replacement on LLM-Generated Text

Hadi Mohammadi; Anastasia Giachanou; Daniel L. Oberski; and Ayoub Bagheri

arXiv:2506.04050·cs.CL·January 6, 2026

Explainability-Based Token Replacement on LLM-Generated Text

Hadi Mohammadi, Anastasia Giachanou, Daniel L. Oberski, and Ayoub Bagheri

PDF

Open Access

TL;DR

This paper explores how explainable AI techniques can be used to modify AI-generated text to evade detection, and proposes ensemble detection methods to counteract such manipulations.

Contribution

It introduces explainability-based token replacement strategies to reduce AI text detectability and demonstrates the effectiveness of ensemble classifiers in maintaining detection robustness.

Findings

01

Token replacement reduces single classifier detectability

02

Ensemble classifiers remain effective across languages and domains

03

Explainability methods can identify influential tokens for manipulation

Abstract

Generative models, especially large language models (LLMs), have shown remarkable progress in producing text that appears human-like. However, they often exhibit patterns that make their output easier to detect than text written by humans. In this paper, we investigate how explainable AI (XAI) methods can be used to reduce the detectability of AI-generated text (AIGT) while also introducing a robust ensemble-based detection approach. We begin by training an ensemble classifier to distinguish AIGT from human-written text, then apply SHAP and LIME to identify tokens that most strongly influence its predictions. We propose four explainability-based token replacement strategies to modify these influential tokens. Our findings show that these token replacement approaches can significantly diminish a single classifier's ability to detect AIGT. However, our ensemble classifier maintains strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis